[ 
https://issues.apache.org/jira/browse/PARQUET-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153793#comment-14153793
 ] 

Ryan Blue commented on PARQUET-110:
-----------------------------------

[~julienledem] and [~tianshuo], I've posted a test and a possible fix for this 
in [PR #70|https://github.com/apache/incubator-parquet-mr/pull/70], but I don't 
know the Pig integration well enough to spot unintended consequences. Ideally, 
we'd fix this upstream in Pig, but this is causing a problem now and I think 
this work-around is safe.

> Some schemas without column projection cause Pig failures
> ---------------------------------------------------------
>
>                 Key: PARQUET-110
>                 URL: https://issues.apache.org/jira/browse/PARQUET-110
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Ryan Blue
>
> Parquet stores and loads the Pig schema in the Configuration. Along the way, 
> Pig changes that Schema:
> {code:java}
> // This schema is converted from Parquet and written in Configuration
> String schemaStr = "my_list: {array: (array_element: (num1: int,num2: int))}";
> // Reparsed using org.apache.pig.impl.util.Utils
> Schema schema = Utils.getSchemaFromString(schemaStr);
> // But no longer matches the original structure
> schema.toString();
> // => {my_list: {array_element: (num1: int,num2: int)}}
> {code}
> Note that the intermediate bag, named either "bag" or "array", is removed 
> when Pig reparses the Schema. I can work around this to an extent in the 
> Parquet code, but the Pig behavior gets more strange. If there are two of 
> these, the second is preserved but renamed to "bag_0". Something funny is 
> going on there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to