[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060114#comment-13060114 ]
Patrick Hunt commented on PIG-1890: ----------------------------------- Hi, I'm seeing an issue with both versions of the attached patches when I run the following: {noformat} REGISTER avro-1.4.1.jar; REGISTER json-simple-1.1.jar; REGISTER piggybank.jar; A = LOAD 'input_123.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); B = LOAD 'input_789.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); C = UNION A, B; DUMP C; {noformat} where each file contains a single tuple; input_123.avro contains "1,2,3" (ints) and input_789.avro contains "7,8,9" Dump C should be returning 2 tuples; 1 tuple 1,2,3 and 1 tuple 7,8,9. Without the patch I see 6 tuples output (3 1,2,3 and 3 7,8,9) With either of the proposed patches applied I see 4 tuples output (2 1,2,3 and 2 7,8,9) >From looking at other pig loader functions it seems like the following would >address the setLocation issue: {noformat} public void setLocation(String location, Job job) throws IOException { - if(AvroStorageUtils.addInputPaths(location, job) && inputAvroSchema == null) { - inputAvroSchema = getAvroSchema(location, job); - } + FileInputFormat.setInputPaths(job, location); + inputAvroSchema = getAvroSchema(location, job); } {noformat} This does resolve the issue for the script I described. However the "addInputPaths" functionality of AvroStorageUtils is lost - but I'm wondering why this was added rather than just rely on the std capabilities of LOAD? (such as globbing). I'd be happy to package up my suggestion as a patch if there's interest. > Fix piggybank unit test TestAvroStorage > --------------------------------------- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.9.0 > Reporter: Daniel Dai > Assignee: Jakob Homan > Attachments: PIG-1890-1.patch, PIG-1890-2.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira