Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException
Which version of hadoop are you using? There's a possibility that the hadoop environment already have a avro**.jar in place, thus caused the jar conflict. Regards, *Stanley Shi,* On Tue, Mar 4, 2014 at 11:25 PM, John Pauley wrote: > Outside hadoop: avro-1.7.6 > Inside hadoop: avro-mapred-1.7.6-hadoop2 > > > From: Stanley Shi > Reply-To: "user@hadoop.apache.org" > Date: Monday, March 3, 2014 at 8:30 PM > To: "user@hadoop.apache.org" > Subject: Re: [hadoop] AvroMultipleOutputs > org.apache.avro.file.DataFileWriter$AppendWriteException > > which avro version are you using when running outside of hadoop? > > Regards, > *Stanley Shi,* > > > > On Mon, Mar 3, 2014 at 11:49 PM, John Pauley > wrote: > >> This is cross posted to avro-user list ( >> http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e >> ). >> >> Hello all, >> >> I’m having an issue using AvroMultipleOutputs in a map/reduce job. The >> issue occurs when using a schema that has a union of null and a fixed >> (among other complex types), default to null, and it is not null. >> Please find the full stack trace below and a sample map/reduce job that >> generates an Avro container file and uses that for the m/r input. Note >> that I can serialize/deserialize without issue using >> GenericDatumWriter/GenericDatumReader outside of hadoop… Any insight would >> be helpful. >> >> Stack trace: >> java.lang.Exception: >> org.apache.avro.file.DataFileWriter$AppendWriteException: >> java.lang.NullPointerException: in com.foo.bar.simple_schema in union null >> of union in field baz of com.foo.bar.simple_schema >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404) >> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: >> java.lang.NullPointerException: in com.foo.bar.simple_schema in union null >> of union in field baz of com.foo.bar.simple_schema >> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296) >> at >> org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77) >> at >> org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39) >> at >> org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400) >> at >> org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378) >> at >> com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78) >> at >> com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266) >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >> at java.lang.Thread.run(Thread.java:695) >> Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema >> in union null of union in field baz of com.foo.bar.simple_schema >> at >> org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145) >> at >> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) >> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290) >> ... 16 more >> Caused by: java.lang.NullPointerException >> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457) >> at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189) >> at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167) >> at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608) >> at >> org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265) >> at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597) >> at >> org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151) >> at >> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71) >> at >> org.apache.avro
Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException
Outside hadoop: avro-1.7.6 Inside hadoop: avro-mapred-1.7.6-hadoop2 From: Stanley Shi mailto:s...@gopivotal.com>> Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" mailto:user@hadoop.apache.org>> Date: Monday, March 3, 2014 at 8:30 PM To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" mailto:user@hadoop.apache.org>> Subject: Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException which avro version are you using when running outside of hadoop? Regards, Stanley Shi, [http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png] On Mon, Mar 3, 2014 at 11:49 PM, John Pauley mailto:john.pau...@threattrack.com>> wrote: This is cross posted to avro-user list (http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e). Hello all, I’m having an issue using AvroMultipleOutputs in a map/reduce job. The issue occurs when using a schema that has a union of null and a fixed (among other complex types), default to null, and it is not null. Please find the full stack trace below and a sample map/reduce job that generates an Avro container file and uses that for the m/r input. Note that I can serialize/deserialize without issue using GenericDatumWriter/GenericDatumReader outside of hadoop… Any insight would be helpful. Stack trace: java.lang.Exception: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of union in field baz of com.foo.bar.simple_schema at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404) Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of union in field baz of com.foo.bar.simple_schema at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296) at org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77) at org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39) at org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400) at org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378) at com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78) at com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of union in field baz of com.foo.bar.simple_schema at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290) ... 16 more Caused by: java.lang.NullPointerException at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457) at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189) at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167) at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608) at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265) at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597) at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71) at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114) at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143) Sample m/r job: package com.tts.ox.mapreduce.example.avro; import org.apache.avro.Schema; import org.apache.avro.file.DataFileWriter; import org.apache.avro.generic.GenericData; i
Re: [hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException
which avro version are you using when running outside of hadoop? Regards, *Stanley Shi,* On Mon, Mar 3, 2014 at 11:49 PM, John Pauley wrote: > This is cross posted to avro-user list ( > http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e > ). > > Hello all, > > I’m having an issue using AvroMultipleOutputs in a map/reduce job. The > issue occurs when using a schema that has a union of null and a fixed > (among other complex types), default to null, and it is not null. Please > find the full stack trace below and a sample map/reduce job that generates > an Avro container file and uses that for the m/r input. Note that I can > serialize/deserialize without issue using > GenericDatumWriter/GenericDatumReader outside of hadoop… Any insight would > be helpful. > > Stack trace: > java.lang.Exception: > org.apache.avro.file.DataFileWriter$AppendWriteException: > java.lang.NullPointerException: in com.foo.bar.simple_schema in union null > of union in field baz of com.foo.bar.simple_schema > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404) > Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: > java.lang.NullPointerException: in com.foo.bar.simple_schema in union null > of union in field baz of com.foo.bar.simple_schema > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296) > at > org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77) > at > org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39) > at > org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400) > at > org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378) > at > com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78) > at > com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:695) > Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema in > union null of union in field baz of com.foo.bar.simple_schema > at > org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290) > ... 16 more > Caused by: java.lang.NullPointerException > at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457) > at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189) > at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167) > at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608) > at > org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265) > at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597) > at > org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71) > at > org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143) > at > org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114) > at > org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175) > at > org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) > at > org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143) > > Sample m/r job: > > package com.tts.ox.mapreduce.example.avro; > > import org.apache.avro.Schema; > import org.apache.avro.file.DataFileWriter; > import org.apache.avro.generic.GenericData; > import org.apache.avro.generic.GenericDatumWriter; > import org.apache.avro.generic.GenericRecord; > import org.apache.avro.generic.GenericRecordBuilder; > import org.apache.avro.io.DatumWriter; > import org.apache.avro.mapred.AvroKey; > import org.apache.avro.mapreduce.AvroJob; > import org.apache.avro.mapreduce.AvroKeyInputFormat; > import org.apache.avro.mapreduce.AvroKeyOutputFormat; > import org.apache.avro.mapreduce.AvroMultipleOutputs; > import org.apache.hado
[hadoop] AvroMultipleOutputs org.apache.avro.file.DataFileWriter$AppendWriteException
This is cross posted to avro-user list (http://mail-archives.apache.org/mod_mbox/avro-user/201402.mbox/%3ccf3612f6.94d2%25john.pau...@threattrack.com%3e). Hello all, I’m having an issue using AvroMultipleOutputs in a map/reduce job. The issue occurs when using a schema that has a union of null and a fixed (among other complex types), default to null, and it is not null. Please find the full stack trace below and a sample map/reduce job that generates an Avro container file and uses that for the m/r input. Note that I can serialize/deserialize without issue using GenericDatumWriter/GenericDatumReader outside of hadoop… Any insight would be helpful. Stack trace: java.lang.Exception: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of union in field baz of com.foo.bar.simple_schema at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:404) Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of union in field baz of com.foo.bar.simple_schema at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296) at org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:77) at org.apache.avro.mapreduce.AvroKeyRecordWriter.write(AvroKeyRecordWriter.java:39) at org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:400) at org.apache.avro.mapreduce.AvroMultipleOutputs.write(AvroMultipleOutputs.java:378) at com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:78) at com.tts.ox.mapreduce.example.avro.AvroContainerFileDriver$SampleMapper.map(AvroContainerFileDriver.java:62) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.NullPointerException: in com.foo.bar.simple_schema in union null of union in field baz of com.foo.bar.simple_schema at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290) ... 16 more Caused by: java.lang.NullPointerException at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:457) at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:189) at org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:167) at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:608) at org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265) at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:597) at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71) at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114) at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143) Sample m/r job: package com.tts.ox.mapreduce.example.avro; import org.apache.avro.Schema; import org.apache.avro.file.DataFileWriter; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericDatumWriter; import org.apache.avro.generic.GenericRecord; import org.apache.avro.generic.GenericRecordBuilder; import org.apache.avro.io.DatumWriter; import org.apache.avro.mapred.AvroKey; import org.apache.avro.mapreduce.AvroJob; import org.apache.avro.mapreduce.AvroKeyInputFormat; import org.apache.avro.mapreduce.AvroKeyOutputFormat; import org.apache.avro.mapreduce.AvroMultipleOutputs; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.out