[jira] [Assigned] (TEZ-1248) Reduce slow-start should special case 1 reducer runs
[ https://issues.apache.org/jira/browse/TEZ-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang reassigned TEZ-1248: - Assignee: Zhiyuan Yang > Reduce slow-start should special case 1 reducer runs > > > Key: TEZ-1248 > URL: https://issues.apache.org/jira/browse/TEZ-1248 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.5.0 > Environment: 20 node cluster running tez >Reporter: Gopal V >Assignee: Zhiyuan Yang >Priority: Critical > > Reducer slow-start has a performance problem for the small cases where there > is just 1 reducer for a case with a single wave. > Tez knows the split count and wave count, being able to determine if the > cluster has enough spare capacity to run the reducer earlier for lower > latency in a N-mapper -> 1 reducer case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368605#comment-15368605 ] Ming Ma commented on TEZ-3331: -- For support this, it seems Tez needs to change its dependency to hadoop 2.8. In addition, there are several other 2.8 YARN and HDFS features Tez can benefit from. Maybe the next major release Tez can switch from hadoop 2.6 to hadoop 2.8? > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3332) Parallelize closing of outputs
[ https://issues.apache.org/jira/browse/TEZ-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368598#comment-15368598 ] Rohini Palaniswamy commented on TEZ-3332: - Below example is on tiny data, so it finished fast. For larger data, parallelizing can provide considerable speedup. {code} 2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output 2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525: Sorting & Spilling map output. bufstart = 0, bufend = 4091674, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67104732(268418928), length = 4129/16777216 2016-07-07 21:39:23,419 [INFO] [TezChild] |compress.CodecPool|: Got brand-new compressor [.lzo_deflate] 2016-07-07 21:39:23,452 [INFO] [TezChild] |mapReduceLayer.PigCombiner$Combine|: Aliases being processed per job phase (AliasName[line,offset]): null 2016-07-07 21:39:23,860 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525: Finished spill 0 2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output 2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554: Sorting & Spilling map output. bufstart = 0, bufend = 493566, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67102792(268411168), length = 6069/16777216 2016-07-07 21:39:24,127 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554: Finished spill 0 2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output 2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512: Sorting & Spilling map output. bufstart = 0, bufend = 769, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67108856(268435424), length = 5/16777216 2016-07-07 21:39:24,148 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512: Finished spill 0 2016-07-07 21:39:24,151 [INFO] [TezChild] |shuffle.ShuffleUtils|: EmptyPartition bitsetSize=18, numOutputs=20, emptyPartitions=18, compressedSize=11 2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output 2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490: Sorting & Spilling map output. bufstart = 0, bufend = 5539516, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67107376(268429504), length = 1485/16777216 2016-07-07 21:39:24,361 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490: Finished spill 0 2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush of map output 2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541: Sorting & Spilling map output. bufstart = 0, bufend = 12169, bufvoid = 268435456; kvstart=67108860(268435440), kvend = 67108736(268434944), length = 125/16777216 2016-07-07 21:39:24,662 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541: Finished spill 0 {code} > Parallelize closing of outputs > -- > > Key: TEZ-3332 > URL: https://issues.apache.org/jira/browse/TEZ-3332 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rohini Palaniswamy > > Currently it is serial and when there are multiple outputs it can take time > to finish sorting and running the combiner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1248) Reduce slow-start should special case 1 reducer runs
[ https://issues.apache.org/jira/browse/TEZ-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1248: Priority: Critical (was: Minor) > Reduce slow-start should special case 1 reducer runs > > > Key: TEZ-1248 > URL: https://issues.apache.org/jira/browse/TEZ-1248 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.5.0 > Environment: 20 node cluster running tez >Reporter: Gopal V >Priority: Critical > > Reducer slow-start has a performance problem for the small cases where there > is just 1 reducer for a case with a single wave. > Tez knows the split count and wave count, being able to determine if the > cluster has enough spare capacity to run the reducer earlier for lower > latency in a N-mapper -> 1 reducer case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3332) Parallelize closing of outputs
Rohini Palaniswamy created TEZ-3332: --- Summary: Parallelize closing of outputs Key: TEZ-3332 URL: https://issues.apache.org/jira/browse/TEZ-3332 Project: Apache Tez Issue Type: Improvement Reporter: Rohini Palaniswamy Currently it is serial and when there are multiple outputs it can take time to finish sorting and running the combiner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated TEZ-3331: -- Description: Hadoop has added several operation specific counters in the FileSystem statistics (HADOOP-13065). These counters are useful to track file system operations more granularly. It would be great to track these counters for Tez and expose them via UI as well. (was: Hadoop has added several operation specific counters in the FileSystem statistics. These counters are useful to track file system operations more granularly. It would be great to track these counters for Tez and expose them via UI as well.) > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3331) Add operation specific HDFS counters for Tez UI
Jitendra Nath Pandey created TEZ-3331: - Summary: Add operation specific HDFS counters for Tez UI Key: TEZ-3331 URL: https://issues.apache.org/jira/browse/TEZ-3331 Project: Apache Tez Issue Type: Bug Reporter: Jitendra Nath Pandey Hadoop has added several operation specific counters in the FileSystem statistics. These counters are useful to track file system operations more granularly. It would be great to track these counters for Tez and expose them via UI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3209) Support for fair custom data routing
[ https://issues.apache.org/jira/browse/TEZ-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated TEZ-3209: - Attachment: TEZ-3209.patch > Support for fair custom data routing > > > Key: TEZ-3209 > URL: https://issues.apache.org/jira/browse/TEZ-3209 > Project: Apache Tez > Issue Type: New Feature >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: TEZ-3209.patch, Tez-based demuxer for highly skewed > category data.pdf > > > This is based on offline discussion with [~gopalv], [~hitesh], > [~jrottinghuis] and [~lohit] w.r.t. the support for efficient processing of > highly skewed unordered partitioned mapper output. Our use case is to demux > highly skewed unordered category data partitioned by category name. Gopal and > Hitesh mentioned dynamically shuffled join scenario. > One option we discussed is to leverage auto-parallelism feature with upfront > over-partitioning. That means possible overhead to support large number > partitions and unnecessary data movement as each reducer needs to get data > from all mappers. > Another alternative is to use custom {{DataMovementType}} which doesn't > require each reducer to fetch data from all mappers. In that way, a large > partition will be processed by several reducers, each of which will fetch > data from a portion of mappers. > For example, say there are 100 mappers each of which has 10 partitions (P1, > ..., P10). Each mapper generates 100MB for its P10 and 1MB for each of its > (P1, ... P9). The default SCATTER_GATHER routing means the reducer for P10 > has to process 10GB of input and becomes the bottleneck of the job. With the > fair custom data routing, The P10 belonging to the first 10 mappers will be > processed by one reducer with 1GB input data. The P10 belonging to the second > 10 mappers will be processed by another reducer, etc. > For further optimization, we can allocate the reducer on the same nodes as > the mappers that it fetches data from. > To support this, we need TEZ-3206 as well as customized data routing based on > {{VertexManagerPlugin}} and {{EdgeManagerPluginOnDemand}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3209) Support for fair custom data routing
[ https://issues.apache.org/jira/browse/TEZ-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated TEZ-3209: - Attachment: (was: TEZ-3209.patch) > Support for fair custom data routing > > > Key: TEZ-3209 > URL: https://issues.apache.org/jira/browse/TEZ-3209 > Project: Apache Tez > Issue Type: New Feature >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: TEZ-3209.patch, Tez-based demuxer for highly skewed > category data.pdf > > > This is based on offline discussion with [~gopalv], [~hitesh], > [~jrottinghuis] and [~lohit] w.r.t. the support for efficient processing of > highly skewed unordered partitioned mapper output. Our use case is to demux > highly skewed unordered category data partitioned by category name. Gopal and > Hitesh mentioned dynamically shuffled join scenario. > One option we discussed is to leverage auto-parallelism feature with upfront > over-partitioning. That means possible overhead to support large number > partitions and unnecessary data movement as each reducer needs to get data > from all mappers. > Another alternative is to use custom {{DataMovementType}} which doesn't > require each reducer to fetch data from all mappers. In that way, a large > partition will be processed by several reducers, each of which will fetch > data from a portion of mappers. > For example, say there are 100 mappers each of which has 10 partitions (P1, > ..., P10). Each mapper generates 100MB for its P10 and 1MB for each of its > (P1, ... P9). The default SCATTER_GATHER routing means the reducer for P10 > has to process 10GB of input and becomes the bottleneck of the job. With the > fair custom data routing, The P10 belonging to the first 10 mappers will be > processed by one reducer with 1GB input data. The P10 belonging to the second > 10 mappers will be processed by another reducer, etc. > For further optimization, we can allocate the reducer on the same nodes as > the mappers that it fetches data from. > To support this, we need TEZ-3206 as well as customized data routing based on > {{VertexManagerPlugin}} and {{EdgeManagerPluginOnDemand}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3209) Support for fair custom data routing
[ https://issues.apache.org/jira/browse/TEZ-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated TEZ-3209: - Attachment: TEZ-3209.patch Here is the WIP patch. Besides the fair routing functionality, it also includes refactor of ShuffleVertexManager. There are many common functionalities between FairShuffleVertexManager and ShuffleVertexManager. In addition, the fair routing also supports auto reduce functionality mentioned in TEZ-2962. The main purpose of this patch is to illustrate the refactor effort and get input from others if refactor is actually a good idea. If it is, then I would create a jira for the refactor effort. > Support for fair custom data routing > > > Key: TEZ-3209 > URL: https://issues.apache.org/jira/browse/TEZ-3209 > Project: Apache Tez > Issue Type: New Feature >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: TEZ-3209.patch, Tez-based demuxer for highly skewed > category data.pdf > > > This is based on offline discussion with [~gopalv], [~hitesh], > [~jrottinghuis] and [~lohit] w.r.t. the support for efficient processing of > highly skewed unordered partitioned mapper output. Our use case is to demux > highly skewed unordered category data partitioned by category name. Gopal and > Hitesh mentioned dynamically shuffled join scenario. > One option we discussed is to leverage auto-parallelism feature with upfront > over-partitioning. That means possible overhead to support large number > partitions and unnecessary data movement as each reducer needs to get data > from all mappers. > Another alternative is to use custom {{DataMovementType}} which doesn't > require each reducer to fetch data from all mappers. In that way, a large > partition will be processed by several reducers, each of which will fetch > data from a portion of mappers. > For example, say there are 100 mappers each of which has 10 partitions (P1, > ..., P10). Each mapper generates 100MB for its P10 and 1MB for each of its > (P1, ... P9). The default SCATTER_GATHER routing means the reducer for P10 > has to process 10GB of input and becomes the bottleneck of the job. With the > fair custom data routing, The P10 belonging to the first 10 mappers will be > processed by one reducer with 1GB input data. The P10 belonging to the second > 10 mappers will be processed by another reducer, etc. > For further optimization, we can allocate the reducer on the same nodes as > the mappers that it fetches data from. > To support this, we need TEZ-3206 as well as customized data routing based on > {{VertexManagerPlugin}} and {{EdgeManagerPluginOnDemand}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs
[ https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367981#comment-15367981 ] TezQA commented on TEZ-3326: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12816843/TEZ-3326.003.patch against master revision 5affb3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1836//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1836//console This message is automatically generated. > Display JVM system properties in AM and task logs > - > > Key: TEZ-3326 > URL: https://issues.apache.org/jira/browse/TEZ-3326 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Eric Badger > Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch, > TEZ-3326.003.patch > > > MapReduce displays JVM system properties via config > {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is > useful to debug env setting such as java version, etc. It is useful to have > such logging in Tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-3326 PreCommit Build #1836
Jira: https://issues.apache.org/jira/browse/TEZ-3326 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1836/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4140 lines...] [INFO] Tez ... SUCCESS [ 0.022 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 56:52 min [INFO] Finished at: 2016-07-08T16:58:16+00:00 [INFO] Final Memory: 75M/1116M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12816843/TEZ-3326.003.patch against master revision 5affb3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1836//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1836//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. d2c7b9b61ff04b9c148c04504b3817a3e9a102ee logged out == == Finished build. == == Archiving artifacts [description-setter] Description set: TEZ-3326 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (TEZ-3324) Add kerberos support in ATSImportTool
[ https://issues.apache.org/jira/browse/TEZ-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3324: -- Description: When running ATSImportTool on kerberos environment, usergroup information needs to be set; without which it ends up throwing exceptions. > Add kerberos support in ATSImportTool > - > > Key: TEZ-3324 > URL: https://issues.apache.org/jira/browse/TEZ-3324 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Zhiyuan Yang > > When running ATSImportTool on kerberos environment, usergroup information > needs to be set; without which it ends up throwing exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
[ https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367878#comment-15367878 ] Hitesh Shah commented on TEZ-3330: -- [~sseth] This is likely due to how we keep the configs small in the inputs/outputs by filtering out the non-required settings. In MR mode, should we just pass in all configs into each Input and Output given that we have no guarantees on what is being used/not-used? > Error on avro M/R job with Tez: missing configuration property > -- > > Key: TEZ-3330 > URL: https://issues.apache.org/jira/browse/TEZ-3330 > Project: Apache Tez > Issue Type: Bug >Reporter: Manuel Godbert > > I tried running the simple avro M/R job MapredColorCount, that I found in the > examples of avro release 1.7.7. > It failed with the following trace: > {code} > errorMessage=Shuffle Runner > Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) > ... 6 more > {code} > Digging a bit I saw that during shuffle Tez can't access some of the > configuration properties of the job. In our example it is the > avro.output.schema that is missing. > With some more complicated code I could get one step further and a similar > issue happened when the valuesIterator for the reducer was being built: > {code} > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:50) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.apache.avro.Schema.parse(Schema.java:966) > at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) > at > org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) > at > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) > at > org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) > {code} > I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs
[ https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367836#comment-15367836 ] TezQA commented on TEZ-3326: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12816828/TEZ-3326.002.patch against master revision 5affb3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in : org.apache.tez.test.TestRecovery Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1835//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1835//console This message is automatically generated. > Display JVM system properties in AM and task logs > - > > Key: TEZ-3326 > URL: https://issues.apache.org/jira/browse/TEZ-3326 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Eric Badger > Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch, > TEZ-3326.003.patch > > > MapReduce displays JVM system properties via config > {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is > useful to debug env setting such as java version, etc. It is useful to have > such logging in Tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-3326 PreCommit Build #1835
Jira: https://issues.apache.org/jira/browse/TEZ-3326 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1835/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 3920 lines...] [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-tests [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12816828/TEZ-3326.002.patch against master revision 5affb3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in : org.apache.tez.test.TestRecovery Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1835//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1835//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 729dc53fc44967846607578bfed3fea1bd0ac92b logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Created] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property
Manuel Godbert created TEZ-3330: --- Summary: Error on avro M/R job with Tez: missing configuration property Key: TEZ-3330 URL: https://issues.apache.org/jira/browse/TEZ-3330 Project: Apache Tez Issue Type: Bug Reporter: Manuel Godbert I tried running the simple avro M/R job MapredColorCount, that I found in the examples of avro release 1.7.7. It failed with the following trace: {code} errorMessage=Shuffle Runner Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error while doing final merge at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NullPointerException at java.io.StringReader.(StringReader.java:50) at org.apache.avro.Schema$Parser.parse(Schema.java:917) at org.apache.avro.Schema.parse(Schema.java:966) at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) at org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376) ... 6 more {code} Digging a bit I saw that during shuffle Tez can't access some of the configuration properties of the job. In our example it is the avro.output.schema that is missing. With some more complicated code I could get one step further and a similar issue happened when the valuesIterator for the reducer was being built: {code} java.lang.NullPointerException at java.io.StringReader.(StringReader.java:50) at org.apache.avro.Schema$Parser.parse(Schema.java:917) at org.apache.avro.Schema.parse(Schema.java:966) at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78) at org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53) at org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90) at org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287) {code} I am using HDP2.4, Tez 0.7.0, avro 1.7.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3326) Display JVM system properties in AM and task logs
[ https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated TEZ-3326: - Attachment: TEZ-3326.003.patch bq. Scope.VERTEX reflects whether this config can be changed on a per dag/vertex basis and if yes, it should take effect. In this case, I think AM is better as this is mainly for logging only when a container starts up and has no impact when a new dag/task is run. [~hitesh], thanks for the explanation. I had an incorrect understanding of the scope annotation. I'm attaching a patch that changes the scope back to AM. > Display JVM system properties in AM and task logs > - > > Key: TEZ-3326 > URL: https://issues.apache.org/jira/browse/TEZ-3326 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Eric Badger > Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch, > TEZ-3326.003.patch > > > MapReduce displays JVM system properties via config > {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is > useful to debug env setting such as java version, etc. It is useful to have > such logging in Tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs
[ https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367776#comment-15367776 ] Hitesh Shah commented on TEZ-3326: -- bq. should the ConfigurationScope be Scope.VERTEX given it applies to both AM and tasks? Scope.VERTEX reflects whether this config can be changed on a per dag/vertex basis and if yes, it should take effect. In this case, I think AM is better as this is mainly for logging only when a container starts up and has no impact when a new dag/task is run. > Display JVM system properties in AM and task logs > - > > Key: TEZ-3326 > URL: https://issues.apache.org/jira/browse/TEZ-3326 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Eric Badger > Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch > > > MapReduce displays JVM system properties via config > {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is > useful to debug env setting such as java version, etc. It is useful to have > such logging in Tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3326) Display JVM system properties in AM and task logs
[ https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated TEZ-3326: - Attachment: TEZ-3326.002.patch [~mingma] bq. should the ConfigurationScope be Scope.VERTEX given it applies to both AM and tasks? Yes, good catch Thank you for the review. Attaching a patch that addresses your comments. > Display JVM system properties in AM and task logs > - > > Key: TEZ-3326 > URL: https://issues.apache.org/jira/browse/TEZ-3326 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Eric Badger > Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch > > > MapReduce displays JVM system properties via config > {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is > useful to debug env setting such as java version, etc. It is useful to have > such logging in Tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367391#comment-15367391 ] Tsuyoshi Ozawa commented on TEZ-3303: - Thank you, Ming. [~sseth] could you also check the patch? > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, > TEZ-3303.002.patch, TEZ-3303.003.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)