[
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060114#comment-13060114
]
Patrick Hunt commented on PIG-1890:
-----------------------------------
Hi, I'm seeing an issue with both versions of the attached patches when I run
the following:
{noformat}
REGISTER avro-1.4.1.jar;
REGISTER json-simple-1.1.jar;
REGISTER piggybank.jar;
A = LOAD 'input_123.avro' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();
B = LOAD 'input_789.avro' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();
C = UNION A, B;
DUMP C;
{noformat}
where each file contains a single tuple; input_123.avro contains "1,2,3" (ints)
and input_789.avro contains "7,8,9"
Dump C should be returning 2 tuples; 1 tuple 1,2,3 and 1 tuple 7,8,9.
Without the patch I see 6 tuples output (3 1,2,3 and 3 7,8,9)
With either of the proposed patches applied I see 4 tuples output (2 1,2,3 and
2 7,8,9)
>From looking at other pig loader functions it seems like the following would
>address the setLocation issue:
{noformat}
public void setLocation(String location, Job job) throws IOException {
- if(AvroStorageUtils.addInputPaths(location, job) && inputAvroSchema ==
null) {
- inputAvroSchema = getAvroSchema(location, job);
- }
+ FileInputFormat.setInputPaths(job, location);
+ inputAvroSchema = getAvroSchema(location, job);
}
{noformat}
This does resolve the issue for the script I described. However the
"addInputPaths" functionality of AvroStorageUtils is lost - but I'm wondering
why this was added rather than just rely on the std capabilities of LOAD? (such
as globbing).
I'd be happy to package up my suggestion as a patch if there's interest.
> Fix piggybank unit test TestAvroStorage
> ---------------------------------------
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.9.0
> Reporter: Daniel Dai
> Assignee: Jakob Homan
> Attachments: PIG-1890-1.patch, PIG-1890-2.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD:
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This
> issue is hidden until PIG-1188 checked in.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira