Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-04 Thread Stanley Shi
Which version of hadoop are you using?
There's a possibility that the hadoop environment already have a avro**.jar
in place, thus caused the jar conflict.

Regards,
*Stanley Shi,*



On Tue, Mar 4, 2014 at 11:25 PM, John Pauley wrote:

>  Outside hadoop: avro-1.7.6
> Inside hadoop:  avro-mapred-1.7.6-hadoop2
>
>
>   From: Stanley Shi 
> Reply-To: "user@hadoop.apache.org" 
> Date: Monday, March 3, 2014 at 8:30 PM
> To: "user@hadoop.apache.org" 
> Subject: Re: [hadoop] AvroMultipleOutputs
> org.apache.avro.file.DataFileWriter$AppendWriteException
>
>   which avro version are you using when running outside of hadoop?
>
>  Regards,
> *Stanley Shi,*
>
>
>
> On Mon, Mar 3, 2014 at 11:49 PM, John Pauley 
> wrote:
>
>>   This is cross posted to avro-user list (
>> http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e
>> ).
>>
>>   Hello all,
>>
>>  I’m having an issue using AvroMultipleOutputs in a map/reduce job.  The
>> issue occurs when using a schema that has a union of null and a fixed
>> (among other complex types), default to null, and it is not null.
>>  Please find the full stack trace below and a sample map/reduce job that
>> generates an Avro container file and uses that for the m/r input.  Note
>> that I can serialize/deserialize without issue using
>> GenericDatumWriter/GenericDatumReader outside of hadoop…  Any insight would
>> be helpful.
>>
>>  Stack trace:
>>  java.lang.Exception:
>> org.apache.avro.file.DataFileWriter$AppendWriteException:
>> java.lang.NullPointerException: in com.foo.bar.simple_schema in union null
>> of union in field baz of com.foo.bar.simple_schema
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
>> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
>> java.lang.NullPointerException: in com.foo.bar.simple_schema in union null
>> of union in field baz of com.foo.bar.simple_schema
>> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
>> at
>> org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77)
>> at
>> org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39)
>> at
>> org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400)
>> at
>> org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378)
>> at
>> com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78)
>> at
>> com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> at java.lang.Thread.run(Thread.java:695)
>> Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema
>> in union null of union in field baz of com.foo.bar.simple_schema
>> at
>> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
>> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
>> ... 16 more
>> Caused by: java.lang.NullPointerException
>> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457)
>> at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
>> at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167)
>> at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608)
>> at
>> org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
>> at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597)
>> at
>> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
>> at
>> org.apache.avro

Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-04 Thread John Pauley
Outside hadoop: avro-1.7.6
Inside hadoop:  avro-mapred-1.7.6-hadoop2

From: Stanley Shi mailto:s...@gopivotal.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Date: Monday, March 3, 2014 at 8:30 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Subject: Re: [hadoop] AvroMultipleOutputs 
org.apache.avro.file.DataFileWriter$AppendWriteException

which avro version are you using when running outside of hadoop?

Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]


On Mon, Mar 3, 2014 at 11:49 PM, John Pauley 
mailto:john.pau...@threattrack.com>> wrote:
This is cross posted to avro-user list 
(http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e).

Hello all,

I’m having an issue using AvroMultipleOutputs in a map/reduce job.  The issue 
occurs when using a schema that has a union of null and a fixed (among other 
complex types), default to null, and it is not null.  Please find the full 
stack trace below and a sample map/reduce job that generates an Avro container 
file and uses that for the m/r input.  Note that I can serialize/deserialize 
without issue using GenericDatumWriter/GenericDatumReader outside of hadoop…  
Any insight would be helpful.

Stack trace:
java.lang.Exception: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of 
union in field baz of com.foo.bar.simple_schema
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of 
union in field baz of com.foo.bar.simple_schema
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
at 
org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77)
at 
org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378)
at 
com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78)
at 
com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema in 
union null of union in field baz of com.foo.bar.simple_schema
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
... 16 more
Caused by: java.lang.NullPointerException
at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457)
at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167)
at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608)
at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597)
at 
org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at 
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)

Sample m/r job:

package com.tts.ox.mapreduce.example.avro;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
i

Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-03 Thread Stanley Shi
which avro version are you using when running outside of hadoop?

Regards,
*Stanley Shi,*



On Mon, Mar 3, 2014 at 11:49 PM, John Pauley wrote:

>   This is cross posted to avro-user list (
> http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e
> ).
>
>   Hello all,
>
>  I’m having an issue using AvroMultipleOutputs in a map/reduce job.  The
> issue occurs when using a schema that has a union of null and a fixed
> (among other complex types), default to null, and it is not null.  Please
> find the full stack trace below and a sample map/reduce job that generates
> an Avro container file and uses that for the m/r input.  Note that I can
> serialize/deserialize without issue using
> GenericDatumWriter/GenericDatumReader outside of hadoop…  Any insight would
> be helpful.
>
>  Stack trace:
>  java.lang.Exception:
> org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.lang.NullPointerException: in com.foo.bar.simple_schema in union null
> of union in field baz of com.foo.bar.simple_schema
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.lang.NullPointerException: in com.foo.bar.simple_schema in union null
> of union in field baz of com.foo.bar.simple_schema
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
> at
> org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77)
> at
> org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39)
> at
> org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400)
> at
> org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378)
> at
> com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78)
> at
> com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:695)
> Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema in
> union null of union in field baz of com.foo.bar.simple_schema
> at
> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
> ... 16 more
> Caused by: java.lang.NullPointerException
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457)
> at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
> at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167)
> at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608)
> at
> org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
> at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597)
> at
> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
> at
> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
> at
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
> at
> org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
> at
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
> at
> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
>
>  Sample m/r job:
> 
>  package com.tts.ox.mapreduce.example.avro;
>
>  import org.apache.avro.Schema;
> import org.apache.avro.file.DataFileWriter;
> import org.apache.avro.generic.GenericData;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.apache.avro.generic.GenericRecord;
> import org.apache.avro.generic.GenericRecordBuilder;
> import org.apache.avro.io.DatumWriter;
> import org.apache.avro.mapred.AvroKey;
> import org.apache.avro.mapreduce.AvroJob;
> import org.apache.avro.mapreduce.AvroKeyInputFormat;
> import org.apache.avro.mapreduce.AvroKeyOutputFormat;
> import org.apache.avro.mapreduce.AvroMultipleOutputs;
> import org.apache.hado

[hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException

2014-03-03 Thread John Pauley
This is cross posted to avro-user list 
(http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e).

Hello all,

I’m having an issue using AvroMultipleOutputs in a map/reduce job.  The issue 
occurs when using a schema that has a union of null and a fixed (among other 
complex types), default to null, and it is not null.  Please find the full 
stack trace below and a sample map/reduce job that generates an Avro container 
file and uses that for the m/r input.  Note that I can serialize/deserialize 
without issue using GenericDatumWriter/GenericDatumReader outside of hadoop…  
Any insight would be helpful.

Stack trace:
java.lang.Exception: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of 
union in field baz of com.foo.bar.simple_schema
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404)
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of 
union in field baz of com.foo.bar.simple_schema
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
at 
org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77)
at 
org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400)
at 
org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378)
at 
com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78)
at 
com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema in 
union null of union in field baz of com.foo.bar.simple_schema
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
... 16 more
Caused by: java.lang.NullPointerException
at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457)
at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189)
at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167)
at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608)
at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597)
at 
org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at 
org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)

Sample m/r job:

package com.tts.ox.mapreduce.example.avro;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.generic.GenericRecordBuilder;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.mapred.AvroKey;
import org.apache.avro.mapreduce.AvroJob;
import org.apache.avro.mapreduce.AvroKeyInputFormat;
import org.apache.avro.mapreduce.AvroKeyOutputFormat;
import org.apache.avro.mapreduce.AvroMultipleOutputs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.out