[ https://issues.apache.org/jira/browse/PIG-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058250#comment-17058250 ]
Koji Noguchi commented on PIG-5400: ----------------------------------- Attached {{pig-5400-v01.patch}} that tries to keep the current behavior unchanged and adds an option for OrcStorage, {{-k}} or {{--keepSingleFieldTuple}} to stop dropping struct(tuple) inside an array(bag) even when it only holds a single field. This patch also fixes a bug when OrcStorage reads an array(bag) of primitive types, Pig was setting an empty inner-schema. {noformat} Before struct<a:array<int>> --> a:{()} After struct<a:array<int>> --> a:{(int)} {noformat} > OrcStorage dropping struct(tuple) when it only holds a single field inside a > Bag(array) > --------------------------------------------------------------------------------------- > > Key: PIG-5400 > URL: https://issues.apache.org/jira/browse/PIG-5400 > Project: Pig > Issue Type: Improvement > Components: impl > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Minor > Attachments: pig-5400-v01.patch > > > I was asked by a user that they were seeing inconsistent schema when stored > on OrcStorage. Sample code > {code} > A = load 'input.txt' as (a0:long); > B = GROUP A by a0; > STORE B into 'filename' using OrcStorage(); > {code} > Pig's schema {{B: {group: long,A: bag: { tuple(a0: long)}}}}. > Expected Orc schema {{struct<group:bigint,A:array<struct<bigint>>>}} > Actual Orc schema {{struct<group:bigint,A:array<bigint>>}} > _This only happens when a tuple contains a single field._ > Current schema without struct(tuple) is better in saving space but it would > be nice to have an option to keep the extra struct(tuple) layer if user > expects schema evolution within that tuple by adding more fields in the > future. -- This message was sent by Atlassian Jira (v8.3.4#803005)