Sebastian Geller created PIG-5108:
-------------------------------------

             Summary: AvroStorage on Tez with exception on nested records
                 Key: PIG-5108
                 URL: https://issues.apache.org/jira/browse/PIG-5108
             Project: Pig
          Issue Type: Bug
          Components: tez
    Affects Versions: 0.16.0
         Environment: HadoopVersion: 2.6.0-cdh5.8.0
PigVersion: 0.16.0
TezVersion: 0.7.0
            Reporter: Sebastian Geller
            Assignee: Nandor Kollar


Hi,

While migrating to the latest Pig version we have seen a general issue when 
using nested Avro records on Tez:

{code}
Caused by: java.io.IOException: class 
org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
implemented yet
        at 
org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
        at 
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
...
{code}

The setup is
schema
{code}
{
    "fields": [
        {
            "name": "id",
            "type": "int"
        },
        {
            "name": "property",
            "type": {
                "fields": [
                    {
                        "name": "id",
                        "type": "int"
                    }
                ],
                "name": "Property",
                "type": "record"
            }
        }
    ],
    "name": "Person",
    "namespace": "com.github.ouyi.avro",
    "type": "record"
}
{code}

Pig script group_person.pig
{code}
loaded_person =
    LOAD '$input'
    USING AvroStorage();

grouped_records =
    GROUP
        loaded_person BY (property.id);

STORE grouped_records
    INTO '$output'
    USING AvroStorage();
{code}

sample data
{code}
{"id":1,"property":{"id":1}}
{code}

Execution on Tez
{code}
pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
output=file:///output group_person.pig
...
Caused by: java.io.IOException: class 
org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not 
implemented yet
        at 
org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
        at 
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
...
{code}

Execution on mapred
{code}
pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p 
output=file:///output7 group_person.pig
...
Output(s):
Successfully stored 1 records in: "file:///output7"
...
{code}

I am going to attach the complete log files of both runs.

I assume that the Pig script should work regardless of Tez or mapreduce? Is 
there any underlying change when migrating to Tez which makes the schema 
invalid?

Thanks,
Sebastian



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to