[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2022-07-14 Thread Seonguk Kim (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563720#comment-17563720
 ] 

Seonguk Kim edited comment on HIVE-24066 at 7/14/22 6:01 AM:
-

I am facing the same problem. I hope the null check for `context.os` works.

(null check for struct column that not exists in file)


was (Author: JIRAUSER292443):
null check support for `context.os` would be useful.

(null check for struct column that not exists in file)

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> 

[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2022-07-08 Thread Seonguk Kim (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563720#comment-17563720
 ] 

Seonguk Kim edited comment on HIVE-24066 at 7/8/22 6:25 AM:


null check support for `context.os` would be useful.

(null check for struct column that not exists in file)


was (Author: JIRAUSER292443):
It would be useful if null check for context.os works.

(null check for struct column that not exists in file)

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> 

[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2022-07-07 Thread Seonguk Kim (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563720#comment-17563720
 ] 

Seonguk Kim edited comment on HIVE-24066 at 7/7/22 11:52 AM:
-

It would be useful if null check for context.os works.

(null check for struct column that not exists in file)


was (Author: JIRAUSER292443):
It seems null check for context.os (struct column that not exists in file) 
should work.

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> 

[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2022-07-07 Thread Seonguk Kim (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563720#comment-17563720
 ] 

Seonguk Kim edited comment on HIVE-24066 at 7/7/22 11:33 AM:
-

It seems null check for context.os (struct column that not exists in file) 
should work.


was (Author: JIRAUSER292443):
It seems null check for context.os (column that not exists in file) should work.

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> 

[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2021-05-06 Thread Shivam Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340225#comment-17340225
 ] 

Shivam Sharma edited comment on HIVE-24066 at 5/6/21, 1:59 PM:
---

I am also facing the same issue.  Can we get an update here?

Can we get some workarounds here other than creating a new table?


was (Author: shivamsharma):
I am also facing the same issue. Can we get update here?

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> 

[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2020-10-25 Thread Chao Gao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220493#comment-17220493
 ] 

Chao Gao edited comment on HIVE-24066 at 10/26/20, 5:52 AM:


I could reproduce this issue using the given PARQUET file. 

When running the following query, it shows NULL.
{code:java}
hive> select context.os from sample_parquet_table;
OK
NULL
NULL
NULL
NULL
NULL{code}
When running the following query, it throws the exception, which is expected 
showing NULL as well.
{code:java}
hive> select context.os.name from sample_parquet_table;
OK
Failed with exception java.io.IOException:java.lang.RuntimeException: Primitive 
type osshould not doesn't match typeos[name]

{code}
 

 


was (Author: chaoga):
I could reproduce this issue using the given PARQUET file. 

When running the following query, it shows NULL.
hive> select context.os from sample_parquet_table;
OK
NULL
NULL
NULL
NULL
NULL
When running the following query, it throws the exception, which is expected 
showing NULL as well.
hive> select context.os.name from sample_parquet_table;
OK
Failed with exception java.io.IOException:java.lang.RuntimeException: Primitive 
type osshould not doesn't match typeos[name]


 

 

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2, 2.3.5
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
>