[
https://issues.apache.org/jira/browse/PIG-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Koji Noguchi updated PIG-5400:
------------------------------
Status: Patch Available (was: Open)
> OrcStorage dropping struct(tuple) when it only holds a single field inside a
> Bag(array)
> ---------------------------------------------------------------------------------------
>
> Key: PIG-5400
> URL: https://issues.apache.org/jira/browse/PIG-5400
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Minor
> Attachments: pig-5400-v01.patch
>
>
> I was asked by a user that they were seeing inconsistent schema when stored
> on OrcStorage. Sample code
> {code}
> A = load 'input.txt' as (a0:long);
> B = GROUP A by a0;
> STORE B into 'filename' using OrcStorage();
> {code}
> Pig's schema {{B: {group: long,A: bag: { tuple(a0: long)}}}}.
> Expected Orc schema {{struct<group:bigint,A:array<struct<bigint>>>}}
> Actual Orc schema {{struct<group:bigint,A:array<bigint>>}}
> _This only happens when a tuple contains a single field._
> Current schema without struct(tuple) is better in saving space but it would
> be nice to have an option to keep the extra struct(tuple) layer if user
> expects schema evolution within that tuple by adding more fields in the
> future.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)