Sebastian Geller created PIG-5108:
-------------------------------------
Summary: AvroStorage on Tez with exception on nested records
Key: PIG-5108
URL: https://issues.apache.org/jira/browse/PIG-5108
Project: Pig
Issue Type: Bug
Components: tez
Affects Versions: 0.16.0
Environment: HadoopVersion: 2.6.0-cdh5.8.0
PigVersion: 0.16.0
TezVersion: 0.7.0
Reporter: Sebastian Geller
Assignee: Nandor Kollar
Hi,
While migrating to the latest Pig version we have seen a general issue when
using nested Avro records on Tez:
{code}
Caused by: java.io.IOException: class
org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not
implemented yet
at
org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
at
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
...
{code}
The setup is
schema
{code}
{
"fields": [
{
"name": "id",
"type": "int"
},
{
"name": "property",
"type": {
"fields": [
{
"name": "id",
"type": "int"
}
],
"name": "Property",
"type": "record"
}
}
],
"name": "Person",
"namespace": "com.github.ouyi.avro",
"type": "record"
}
{code}
Pig script group_person.pig
{code}
loaded_person =
LOAD '$input'
USING AvroStorage();
grouped_records =
GROUP
loaded_person BY (property.id);
STORE grouped_records
INTO '$output'
USING AvroStorage();
{code}
sample data
{code}
{"id":1,"property":{"id":1}}
{code}
Execution on Tez
{code}
pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p
output=file:///output group_person.pig
...
Caused by: java.io.IOException: class
org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not
implemented yet
at
org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
at
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
...
{code}
Execution on mapred
{code}
pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p
output=file:///output7 group_person.pig
...
Output(s):
Successfully stored 1 records in: "file:///output7"
...
{code}
I am going to attach the complete log files of both runs.
I assume that the Pig script should work regardless of Tez or mapreduce? Is
there any underlying change when migrating to Tez which makes the schema
invalid?
Thanks,
Sebastian
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)