[ https://issues.apache.org/jira/browse/HIVE-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280948#comment-14280948 ]
Sergio Peña commented on HIVE-9303: ----------------------------------- I see that 'select ... parquet_tbl' is doing the correct job to display the struct fields based on the definition level. So, I assume the problem is when creating the parquet_tbl from text_tbl, where text_tbl has NULL rows, and parquet is copying them as (n-1) definitions levels instead of 0, right? So, instead of detecting a:NULL (Level 0) is detecting a:b:NULL (Level 2). > Parquet files are written with incorrect definition levels > ---------------------------------------------------------- > > Key: HIVE-9303 > URL: https://issues.apache.org/jira/browse/HIVE-9303 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.1 > Reporter: Skye Wanderman-Milne > > The definition level, which determines which level of nesting is NULL, > appears to always be n or n-1, where n is the maximum definition level. This > means that only the innermost level of nesting can be NULL. This is only > relevant for Parquet files. For example: > {code:sql} > CREATE TABLE text_tbl (a STRUCT<b:STRUCT<c:INT>>) > STORED AS TEXTFILE; > INSERT OVERWRITE TABLE text_tbl > SELECT IF(false, named_struct("b", named_struct("c", 1)), NULL) > FROM tbl LIMIT 1; > CREATE TABLE parq_tbl > STORED AS PARQUET > AS SELECT * FROM text_tbl; > SELECT * FROM text_tbl; > => NULL # right > SELECT * FROM parq_tbl; > => {"b":{"c":null}} # wrong > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)