[
https://issues.apache.org/jira/browse/HIVE-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280948#comment-14280948
]
Sergio Peña commented on HIVE-9303:
-----------------------------------
I see that 'select ... parquet_tbl' is doing the correct job to display the
struct fields based on the definition level.
So, I assume the problem is when creating the parquet_tbl from text_tbl, where
text_tbl has NULL rows, and parquet is copying them as (n-1) definitions levels
instead of 0, right? So, instead of detecting a:NULL (Level 0) is detecting
a:b:NULL (Level 2).
> Parquet files are written with incorrect definition levels
> ----------------------------------------------------------
>
> Key: HIVE-9303
> URL: https://issues.apache.org/jira/browse/HIVE-9303
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.1
> Reporter: Skye Wanderman-Milne
>
> The definition level, which determines which level of nesting is NULL,
> appears to always be n or n-1, where n is the maximum definition level. This
> means that only the innermost level of nesting can be NULL. This is only
> relevant for Parquet files. For example:
> {code:sql}
> CREATE TABLE text_tbl (a STRUCT<b:STRUCT<c:INT>>)
> STORED AS TEXTFILE;
> INSERT OVERWRITE TABLE text_tbl
> SELECT IF(false, named_struct("b", named_struct("c", 1)), NULL)
> FROM tbl LIMIT 1;
> CREATE TABLE parq_tbl
> STORED AS PARQUET
> AS SELECT * FROM text_tbl;
> SELECT * FROM text_tbl;
> => NULL # right
> SELECT * FROM parq_tbl;
> => {"b":{"c":null}} # wrong
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)