Ryan Blue created PARQUET-110:
---------------------------------
Summary: Some schemas without column projection cause Pig failures
Key: PARQUET-110
URL: https://issues.apache.org/jira/browse/PARQUET-110
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Reporter: Ryan Blue
Parquet stores and loads the Pig schema in the Configuration. Along the way,
Pig changes that Schema:
{code:java}
// This schema is converted from Parquet and written in Configuration
String schemaStr = "my_list: {array: (array_element: (num1: int,num2: int))}";
// Reparsed using org.apache.pig.impl.util.Utils
Schema schema = Utils.getSchemaFromString(schemaStr);
// But no longer matches the original structure
schema.toString();
// => {my_list: {array_element: (num1: int,num2: int)}}
{code}
Note that the intermediate bag, named either "bag" or "array", is removed when
Pig reparses the Schema. I can work around this to an extent in the Parquet
code, but the Pig behavior gets more strange. If there are two of these, the
second is preserved but renamed to "bag_0". Something funny is going on there.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)