[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562979#comment-15562979 ] Siddharth Seth commented on TEZ-3330: - Thanks for the review. Committing. Think the findbugs warnings is being fixed by TEZ-3464. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert >Assignee: Siddharth Seth > Labels: newbie > Attachments: TEZ-3330.01.patch, TEZ-3330.temp.2.patch, > TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556156#comment-15556156 ] Hitesh Shah commented on TEZ-3330: -- +1 - not sure about the findbugs which is in tez-dag and therefore unrelated. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert >Assignee: Siddharth Seth > Labels: newbie > Attachments: TEZ-3330.01.patch, TEZ-3330.temp.2.patch, > TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15556137#comment-15556137 ] TezQA commented on TEZ-3330: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12832178/TEZ-3330.01.patch against master revision dceb365. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2025//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2025//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2025//console This message is automatically generated. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert >Assignee: Siddharth Seth > Labels: newbie > Attachments: TEZ-3330.01.patch, TEZ-3330.temp.2.patch, > TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKV
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506084#comment-15506084 ] Manuel Godbert commented on TEZ-3330: - I am afraid I do not understand what you expect from me, I am not used to git patches, just basic push and pull... I let you finalize the work! > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.2.patch, TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504312#comment-15504312 ] Siddharth Seth commented on TEZ-3330: - Getting this patch in, will need some test changes. I'll see if I can get to this sometime, otherwise [~manuel.godbert] - if you can, please update the patch so that it can be committed. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.2.patch, TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493310#comment-15493310 ] Manuel Godbert commented on TEZ-3330: - Thanks, this solves the issue! > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.2.patch, TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390313#comment-15390313 ] TezQA commented on TEZ-3330: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12819719/TEZ-3330.temp.2.patch against master revision 97fa44f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1873//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1873//console This message is automatically generated. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.2.patch, TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGr
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390172#comment-15390172 ] Siddharth Seth commented on TEZ-3330: - bq. the "this.conf.addResource(conf)" in the patch does not affect properties already present in the initial conf. Good point. That's not how it should have been done anyway. There's already a helper to do this. Let me try making a quick change to the patch. [~mandecannes] - feel free to update the patch as well, as you see fit. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384408#comment-15384408 ] Manuel Godbert commented on TEZ-3330: - Hello, thanks for the patch. I just tested it, it solves the shuffle error but not the second issue. The full trace is: {code} task:java.lang.NullPointerException at java.io.StringReader.(StringReader.java:50) at org.apache.avro.Schema$Parser.parse(Schema.java:917) at org.apache.avro.Schema.parse(Schema.java:966) at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) at org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) at org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) at org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:81) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:280) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.waitForInputReady(OrderedGroupedKVInput.java:176) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.getReader(OrderedGroupedKVInput.java:240) at org.apache.tez.mapreduce.processor.reduce.ReduceProcessor.run(ReduceProcessor.java:130) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} Regards > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375958#comment-15375958 ] TezQA commented on TEZ-3330: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12817802/TEZ-3330.temp.patch against master revision 55f5186. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1850//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1850//console This message is automatically generated. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) >
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375782#comment-15375782 ] Siddharth Seth commented on TEZ-3330: - I don't think there's any way to do this at the moment. Attaching a temporary patch for this. Don't think fixing this properly is trivial; well we could just skip the ConfigBuilders altogether. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > Attachments: TEZ-3330.temp.patch > > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372798#comment-15372798 ] Manuel Godbert commented on TEZ-3330: - I already tried that actually, with no success: the configuration property becomes available during shuffle but its value is the constant value of the tez-site.xml, not the value dynamically built at job setup. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371859#comment-15371859 ] Hitesh Shah commented on TEZ-3330: -- For now, can you try adding the configs in question into tez-site.xml and see if that gets you past the error? > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371714#comment-15371714 ] Manuel Godbert commented on TEZ-3330: - This would be nice. Before a fix is available, do you know if there is a way to parameterize the filter, defining the keys I need to keep in a special place for example? Or any other kind of workaround? > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371177#comment-15371177 ] Siddharth Seth commented on TEZ-3330: - That makes sense. Maybe we should consider removing the filtering completely.. > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367878#comment-15367878 ] Hitesh Shah commented on TEZ-3330: -- [~sseth] This is likely due to how we keep the configs small in the inputs/outputs by filtering out the non-required settings. In MR mode, should we just pass in all configs into each Input and Output given that we have no guarantees on what is being used/not-used? > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)