Anton Oellerer created AVRO-2787:
------------------------------------

             Summary: Hadoop Mapreduce job fails when creating Writer
                 Key: AVRO-2787
                 URL: https://issues.apache.org/jira/browse/AVRO-2787
             Project: Apache Avro
          Issue Type: Bug
          Components: docker, java
    Affects Versions: 1.9.2
         Environment: Development
 * OS: Fedora 31
 * Java version 8
 * Gradle version 6.2.2
 * Avro version 1.9.2
 * Shadow version 5.2.0
 * Gradle-avro-plugin version 0.19.1

Running in a Podman container
 * OS: Ubuntu 18.04
 * Podman 1.8.2
 * Hadoop version 3.2.1
 * Java version 8
            Reporter: Anton Oellerer
         Attachments: CategoryData.avsc, CategoryTokensReducer.java, 
TextprocessingfundamentalsApplication.java

Hey,

I am trying to create a Hadoop pipeline getting the chi squared value in for 
tokens in reviews saved in JSON.

For this, I created multiple Hadoop jobs, and the communication between them 
happens, partly, with Avro Data containers.

When trying to run this pipeline, I get the following error at the end of the 
first reduce Job (Signature
{code:java}
public class CategoryTokensReducer extends Reducer<Text, StringArrayWritable, 
AvroKey<CharSequence>, AvroValue<CategoryData>>{code}
)

Error:
{code:java}
java.lang.Exception: java.lang.NoSuchMethodError: 
org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)   
                            
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)        
                            
Caused by: java.lang.NoSuchMethodError: 
org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
        at 
org.apache.avro.hadoop.io.AvroKeyValue.getSchema(AvroKeyValue.java:111)         
           
        at 
org.apache.avro.mapreduce.AvroKeyValueRecordWriter.<init>(AvroKeyValueRecordWriter.java:84)
         
        at 
org.apache.avro.mapreduce.AvroKeyValueOutputFormat.getRecordWriter(AvroKeyValueOutputFormat.java:70)
        at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)         
                      
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)             
          
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)                                
                                       
{code}
The Job is setup like this:
{code:java}
Job jsonToCategoryTokensJob = Job.getInstance(conf, "json to category data");
AvroJob.setOutputKeySchema(jsonToCategoryTokensJob, 
Schema.create(Schema.Type.STRING));
AvroJob.setOutputValueSchema(jsonToCategoryTokensJob, 
CategoryData.getClassSchema());

jsonToCategoryTokensJob.setJarByClass(TextprocessingfundamentalsApplication.class);

jsonToCategoryTokensJob.setMapperClass(JsonToCategoryTokensMapper.class);
jsonToCategoryTokensJob.setMapOutputKeyClass(Text.class);
jsonToCategoryTokensJob.setMapOutputValueClass(StringArrayWritable.class);

jsonToCategoryTokensJob.setReducerClass(CategoryTokensReducer.class);
jsonToCategoryTokensJob.setOutputFormatClass(AvroKeyValueOutputFormat.class);

String in = otherArgs.get(0);
String out = otherArgs.get(1);

FileInputFormat.addInputPath(jsonToCategoryTokensJob, new Path(in));
FileOutputFormat.setOutputPath(jsonToCategoryTokensJob, new Path(out, 
"outCategoryData"));
{code}
Does someone know what the problem here might be?

Best regards

Anton



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to