[
https://issues.apache.org/jira/browse/PARQUET-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153793#comment-14153793
]
Ryan Blue commented on PARQUET-110:
-----------------------------------
[~julienledem] and [~tianshuo], I've posted a test and a possible fix for this
in [PR #70|https://github.com/apache/incubator-parquet-mr/pull/70], but I don't
know the Pig integration well enough to spot unintended consequences. Ideally,
we'd fix this upstream in Pig, but this is causing a problem now and I think
this work-around is safe.
> Some schemas without column projection cause Pig failures
> ---------------------------------------------------------
>
> Key: PARQUET-110
> URL: https://issues.apache.org/jira/browse/PARQUET-110
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Reporter: Ryan Blue
>
> Parquet stores and loads the Pig schema in the Configuration. Along the way,
> Pig changes that Schema:
> {code:java}
> // This schema is converted from Parquet and written in Configuration
> String schemaStr = "my_list: {array: (array_element: (num1: int,num2: int))}";
> // Reparsed using org.apache.pig.impl.util.Utils
> Schema schema = Utils.getSchemaFromString(schemaStr);
> // But no longer matches the original structure
> schema.toString();
> // => {my_list: {array_element: (num1: int,num2: int)}}
> {code}
> Note that the intermediate bag, named either "bag" or "array", is removed
> when Pig reparses the Schema. I can work around this to an extent in the
> Parquet code, but the Pig behavior gets more strange. If there are two of
> these, the second is preserved but renamed to "bag_0". Something funny is
> going on there.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)