[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060114#comment-13060114
 ] 

Patrick Hunt commented on PIG-1890:
-----------------------------------

Hi, I'm seeing an issue with both versions of the attached patches when I run 
the following:

{noformat}
REGISTER avro-1.4.1.jar; 
REGISTER json-simple-1.1.jar; 
REGISTER piggybank.jar;

A = LOAD 'input_123.avro' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();

B = LOAD 'input_789.avro' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();

C = UNION A, B; 
DUMP C;
{noformat}

where each file contains a single tuple; input_123.avro contains "1,2,3" (ints) 
and input_789.avro contains "7,8,9"
Dump C should be returning 2 tuples; 1 tuple 1,2,3 and 1 tuple 7,8,9.

Without the patch I see 6 tuples output (3 1,2,3 and 3 7,8,9)
With either of the proposed patches applied I see 4 tuples output (2 1,2,3 and 
2 7,8,9)

>From looking at other pig loader functions it seems like the following would 
>address the setLocation issue:

{noformat}
     public void setLocation(String location, Job job) throws IOException {
-        if(AvroStorageUtils.addInputPaths(location, job) && inputAvroSchema == 
null) {
-            inputAvroSchema = getAvroSchema(location, job);
-        }
+        FileInputFormat.setInputPaths(job, location);
+        inputAvroSchema = getAvroSchema(location, job);
     }
{noformat}

This does resolve the issue for the script I described. However the 
"addInputPaths" functionality of AvroStorageUtils is lost - but I'm wondering 
why this was added rather than just rely on the std capabilities of LOAD? (such 
as globbing).


I'd be happy to package up my suggestion as a patch if there's interest.


> Fix piggybank unit test TestAvroStorage
> ---------------------------------------
>
>                 Key: PIG-1890
>                 URL: https://issues.apache.org/jira/browse/PIG-1890
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Daniel Dai
>            Assignee: Jakob Homan
>         Attachments: PIG-1890-1.patch, PIG-1890-2.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to