Koji Noguchi created PIG-5399:
---------------------------------
Summary: OrcStorage dropping Tuple(struct) schema when Tuple only
has one field
Key: PIG-5399
URL: https://issues.apache.org/jira/browse/PIG-5399
Project: Pig
Issue Type: Improvement
Reporter: Koji Noguchi
Assignee: Koji Noguchi
I was asked by a user that they were seeing inconsistent schema when stored on
OrcStorage.
Sample code
{code}
A = load 'input.txt' as (a0:long);
B = GROUP A by a0;
STORE B into 'filename' using OrcStorage();
{code}
Pig's schema
{{B: {group: long,A: bag: { tuple(a0: long)}}}}.
Expected Orc schema
{{struct<group:bigint,A:array<struct<bigint>>>}}
Actual Orc schema
{{struct<group:bigint,A:array<bigint>>}}
_This only happens when a tuple contain a single item._
Current schema without struct(tuple) is better in saving space but it would be
nice to have an option to keep the extra struct(tuple) layer if user expects
schema evolution within that tuple by adding more fields in the future.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)