date:20160708

[jira] [Assigned] (TEZ-1248) Reduce slow-start should special case 1 reducer runs

2016-07-08 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned TEZ-1248:
-

Assignee: Zhiyuan Yang

> Reduce slow-start should special case 1 reducer runs
> 
>
> Key: TEZ-1248
> URL: https://issues.apache.org/jira/browse/TEZ-1248
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.0
> Environment: 20 node cluster running tez
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>Priority: Critical
>
> Reducer slow-start has a performance problem for the small cases where there 
> is just 1 reducer for a case with a single wave.
> Tez knows the split count and wave count, being able to determine if the 
> cluster has enough spare capacity to run the reducer earlier for lower 
> latency in a N-mapper -> 1 reducer case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2016-07-08 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368605#comment-15368605
 ] 

Ming Ma commented on TEZ-3331:
--

For support this, it seems Tez needs to change its dependency to hadoop 2.8. In 
addition, there are several other 2.8 YARN and HDFS features Tez can benefit 
from. Maybe the next major release Tez can switch from hadoop 2.6 to hadoop 2.8?

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3332) Parallelize closing of outputs

2016-07-08 Thread Rohini Palaniswamy (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368598#comment-15368598
 ] 

Rohini Palaniswamy commented on TEZ-3332:
-

Below example is on tiny data, so it finished fast. For larger data, 
parallelizing can provide considerable speedup.

{code}
2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:23,392 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525: 
Sorting & Spilling map output. bufstart = 0, bufend = 4091674, bufvoid = 
268435456; kvstart=67108860(268435440), kvend = 67104732(268418928), length = 
4129/16777216
2016-07-07 21:39:23,419 [INFO] [TezChild] |compress.CodecPool|: Got brand-new 
compressor [.lzo_deflate]
2016-07-07 21:39:23,452 [INFO] [TezChild] |mapReduceLayer.PigCombiner$Combine|: 
Aliases being processed per job phase (AliasName[line,offset]): null
2016-07-07 21:39:23,860 [INFO] [TezChild] |dflt.DefaultSorter|: scope-525: 
Finished spill 0
2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:23,894 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554: 
Sorting & Spilling map output. bufstart = 0, bufend = 493566, bufvoid = 
268435456; kvstart=67108860(268435440), kvend = 67102792(268411168), length = 
6069/16777216
2016-07-07 21:39:24,127 [INFO] [TezChild] |dflt.DefaultSorter|: scope-554: 
Finished spill 0
2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:24,130 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512: 
Sorting & Spilling map output. bufstart = 0, bufend = 769, bufvoid = 268435456; 
kvstart=67108860(268435440), kvend = 67108856(268435424), length = 5/16777216
2016-07-07 21:39:24,148 [INFO] [TezChild] |dflt.DefaultSorter|: scope-512: 
Finished spill 0
2016-07-07 21:39:24,151 [INFO] [TezChild] |shuffle.ShuffleUtils|: 
EmptyPartition bitsetSize=18, numOutputs=20, emptyPartitions=18, 
compressedSize=11
2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:24,152 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490: 
Sorting & Spilling map output. bufstart = 0, bufend = 5539516, bufvoid = 
268435456; kvstart=67108860(268435440), kvend = 67107376(268429504), length = 
1485/16777216
2016-07-07 21:39:24,361 [INFO] [TezChild] |dflt.DefaultSorter|: scope-490: 
Finished spill 0
2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: Starting flush 
of map output
2016-07-07 21:39:24,363 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541: 
Sorting & Spilling map output. bufstart = 0, bufend = 12169, bufvoid = 
268435456; kvstart=67108860(268435440), kvend = 67108736(268434944), length = 
125/16777216
2016-07-07 21:39:24,662 [INFO] [TezChild] |dflt.DefaultSorter|: scope-541: 
Finished spill 0
{code}

> Parallelize closing of outputs
> --
>
> Key: TEZ-3332
> URL: https://issues.apache.org/jira/browse/TEZ-3332
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>
> Currently it is serial and when there are multiple outputs it can take time 
> to finish sorting and running the combiner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1248) Reduce slow-start should special case 1 reducer runs

2016-07-08 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1248:

Priority: Critical  (was: Minor)

> Reduce slow-start should special case 1 reducer runs
> 
>
> Key: TEZ-1248
> URL: https://issues.apache.org/jira/browse/TEZ-1248
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.5.0
> Environment: 20 node cluster running tez
>Reporter: Gopal V
>Priority: Critical
>
> Reducer slow-start has a performance problem for the small cases where there 
> is just 1 reducer for a case with a single wave.
> Tez knows the split count and wave count, being able to determine if the 
> cluster has enough spare capacity to run the reducer earlier for lower 
> latency in a N-mapper -> 1 reducer case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3332) Parallelize closing of outputs

2016-07-08 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-3332:
---

 Summary: Parallelize closing of outputs
 Key: TEZ-3332
 URL: https://issues.apache.org/jira/browse/TEZ-3332
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rohini Palaniswamy


Currently it is serial and when there are multiple outputs it can take time to 
finish sorting and running the combiner





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2016-07-08 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated TEZ-3331:
--
Description: Hadoop has added several operation specific counters in the 
FileSystem statistics (HADOOP-13065). These counters are useful to track file 
system operations more granularly. It would be great to track these counters 
for Tez and expose them via UI as well.  (was: Hadoop has added several 
operation specific counters in the FileSystem statistics. These counters are 
useful to track file system operations more granularly. It would be great to 
track these counters for Tez and expose them via UI as well.)

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2016-07-08 Thread Jitendra Nath Pandey (JIRA)

Jitendra Nath Pandey created TEZ-3331:
-

 Summary: Add operation specific HDFS counters for Tez UI
 Key: TEZ-3331
 URL: https://issues.apache.org/jira/browse/TEZ-3331
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jitendra Nath Pandey


Hadoop has added several operation specific counters in the FileSystem 
statistics. These counters are useful to track file system operations more 
granularly. It would be great to track these counters for Tez and expose them 
via UI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3209) Support for fair custom data routing

2016-07-08 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3209:
-
Attachment: TEZ-3209.patch

> Support for fair custom data routing
> 
>
> Key: TEZ-3209
> URL: https://issues.apache.org/jira/browse/TEZ-3209
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3209.patch, Tez-based demuxer for highly skewed 
> category data.pdf
>
>
> This is based on offline discussion with [~gopalv], [~hitesh], 
> [~jrottinghuis] and [~lohit] w.r.t. the support for efficient processing of 
> highly skewed unordered partitioned mapper output. Our use case is to demux 
> highly skewed unordered category data partitioned by category name. Gopal and 
> Hitesh mentioned dynamically shuffled join scenario.
> One option we discussed is to leverage auto-parallelism feature with upfront 
> over-partitioning. That means possible overhead to support large number 
> partitions and unnecessary data movement as each reducer needs to get data 
> from all mappers. 
> Another alternative is to use custom {{DataMovementType}} which doesn't 
> require each reducer to fetch data from all mappers. In that way, a large 
> partition will be processed by several reducers, each of which will fetch 
> data from a portion of mappers.
> For example, say there are 100 mappers each of which has 10 partitions (P1, 
> ..., P10). Each mapper generates 100MB for its P10 and 1MB for each of its 
> (P1, ... P9). The default SCATTER_GATHER routing means the reducer for P10 
> has to process 10GB of input and becomes the bottleneck of the job. With the 
> fair custom data routing, The P10 belonging to the first 10 mappers will be 
> processed by one reducer with 1GB input data. The P10 belonging to the second 
> 10 mappers will be processed by another reducer, etc.
> For further optimization, we can allocate the reducer on the same nodes as 
> the mappers that it fetches data from.
> To support this, we need TEZ-3206 as well as customized data routing based on 
> {{VertexManagerPlugin}} and {{EdgeManagerPluginOnDemand}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3209) Support for fair custom data routing

2016-07-08 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3209:
-
Attachment: (was: TEZ-3209.patch)

> Support for fair custom data routing
> 
>
> Key: TEZ-3209
> URL: https://issues.apache.org/jira/browse/TEZ-3209
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3209.patch, Tez-based demuxer for highly skewed 
> category data.pdf
>
>
> This is based on offline discussion with [~gopalv], [~hitesh], 
> [~jrottinghuis] and [~lohit] w.r.t. the support for efficient processing of 
> highly skewed unordered partitioned mapper output. Our use case is to demux 
> highly skewed unordered category data partitioned by category name. Gopal and 
> Hitesh mentioned dynamically shuffled join scenario.
> One option we discussed is to leverage auto-parallelism feature with upfront 
> over-partitioning. That means possible overhead to support large number 
> partitions and unnecessary data movement as each reducer needs to get data 
> from all mappers. 
> Another alternative is to use custom {{DataMovementType}} which doesn't 
> require each reducer to fetch data from all mappers. In that way, a large 
> partition will be processed by several reducers, each of which will fetch 
> data from a portion of mappers.
> For example, say there are 100 mappers each of which has 10 partitions (P1, 
> ..., P10). Each mapper generates 100MB for its P10 and 1MB for each of its 
> (P1, ... P9). The default SCATTER_GATHER routing means the reducer for P10 
> has to process 10GB of input and becomes the bottleneck of the job. With the 
> fair custom data routing, The P10 belonging to the first 10 mappers will be 
> processed by one reducer with 1GB input data. The P10 belonging to the second 
> 10 mappers will be processed by another reducer, etc.
> For further optimization, we can allocate the reducer on the same nodes as 
> the mappers that it fetches data from.
> To support this, we need TEZ-3206 as well as customized data routing based on 
> {{VertexManagerPlugin}} and {{EdgeManagerPluginOnDemand}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3209) Support for fair custom data routing

2016-07-08 Thread Ming Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3209:
-
Attachment: TEZ-3209.patch

Here is the WIP patch. Besides the fair routing functionality, it also includes 
refactor of ShuffleVertexManager. There are many common functionalities between 
FairShuffleVertexManager and ShuffleVertexManager.

In addition, the fair routing also supports auto reduce functionality mentioned 
in TEZ-2962.

The main purpose of this patch is to illustrate the refactor effort and get 
input from others if refactor is actually a good idea. If it is, then I would 
create a jira for the refactor effort. 

> Support for fair custom data routing
> 
>
> Key: TEZ-3209
> URL: https://issues.apache.org/jira/browse/TEZ-3209
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3209.patch, Tez-based demuxer for highly skewed 
> category data.pdf
>
>
> This is based on offline discussion with [~gopalv], [~hitesh], 
> [~jrottinghuis] and [~lohit] w.r.t. the support for efficient processing of 
> highly skewed unordered partitioned mapper output. Our use case is to demux 
> highly skewed unordered category data partitioned by category name. Gopal and 
> Hitesh mentioned dynamically shuffled join scenario.
> One option we discussed is to leverage auto-parallelism feature with upfront 
> over-partitioning. That means possible overhead to support large number 
> partitions and unnecessary data movement as each reducer needs to get data 
> from all mappers. 
> Another alternative is to use custom {{DataMovementType}} which doesn't 
> require each reducer to fetch data from all mappers. In that way, a large 
> partition will be processed by several reducers, each of which will fetch 
> data from a portion of mappers.
> For example, say there are 100 mappers each of which has 10 partitions (P1, 
> ..., P10). Each mapper generates 100MB for its P10 and 1MB for each of its 
> (P1, ... P9). The default SCATTER_GATHER routing means the reducer for P10 
> has to process 10GB of input and becomes the bottleneck of the job. With the 
> fair custom data routing, The P10 belonging to the first 10 mappers will be 
> processed by one reducer with 1GB input data. The P10 belonging to the second 
> 10 mappers will be processed by another reducer, etc.
> For further optimization, we can allocate the reducer on the same nodes as 
> the mappers that it fetches data from.
> To support this, we need TEZ-3206 as well as customized data routing based on 
> {{VertexManagerPlugin}} and {{EdgeManagerPluginOnDemand}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs

2016-07-08 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367981#comment-15367981
 ] 

TezQA commented on TEZ-3326:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12816843/TEZ-3326.003.patch
  against master revision 5affb3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1836//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1836//console

This message is automatically generated.

> Display JVM system properties in AM and task logs
> -
>
> Key: TEZ-3326
> URL: https://issues.apache.org/jira/browse/TEZ-3326
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Eric Badger
> Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch, 
> TEZ-3326.003.patch
>
>
> MapReduce displays JVM system properties via config 
> {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is 
> useful to debug env setting such as java version, etc. It is useful to have 
> such logging in Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: TEZ-3326 PreCommit Build #1836

2016-07-08 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-3326
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1836/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4140 lines...]
[INFO] Tez ... SUCCESS [  0.022 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 56:52 min
[INFO] Finished at: 2016-07-08T16:58:16+00:00
[INFO] Final Memory: 75M/1116M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12816843/TEZ-3326.003.patch
  against master revision 5affb3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1836//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1836//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
d2c7b9b61ff04b9c148c04504b3817a3e9a102ee logged out


==
==
Finished build.
==
==


Archiving artifacts
[description-setter] Description set: TEZ-3326
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-3324) Add kerberos support in ATSImportTool

2016-07-08 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-3324:
--
Description: When running ATSImportTool on kerberos environment, usergroup 
information needs to be set; without which it ends up throwing exceptions.

> Add kerberos support in ATSImportTool
> -
>
> Key: TEZ-3324
> URL: https://issues.apache.org/jira/browse/TEZ-3324
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>
> When running ATSImportTool on kerberos environment, usergroup information 
> needs to be set; without which it ends up throwing exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-08 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367878#comment-15367878
 ] 

Hitesh Shah commented on TEZ-3330:
--

[~sseth] This is likely due to how we keep the configs small in the 
inputs/outputs by filtering out the non-required settings. In MR mode, should 
we just pass in all configs into each Input and Output given that we have no 
guarantees on what is being used/not-used?  

> Error on avro M/R job with Tez: missing configuration property
> --
>
> Key: TEZ-3330
> URL: https://issues.apache.org/jira/browse/TEZ-3330
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Manuel Godbert
>
> I tried running the simple avro M/R job MapredColorCount, that I found in the 
> examples of avro release 1.7.7.
> It failed with the following trace:
> {code}
> errorMessage=Shuffle Runner 
> Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
>  Error while doing final merge
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
> ... 6 more
> {code}
> Digging a bit I saw that during shuffle Tez can't access some of the 
> configuration properties of the job. In our example it is the 
> avro.output.schema that is missing.
> With some more complicated code I could get one step further and a similar 
> issue happened when the valuesIterator for the reducer was being built:
> {code}
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at org.apache.avro.Schema.parse(Schema.java:966)
> at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
> at 
> org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
> at 
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
> {code}
> I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs

2016-07-08 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367836#comment-15367836
 ] 

TezQA commented on TEZ-3326:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12816828/TEZ-3326.002.patch
  against master revision 5affb3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in :
 org.apache.tez.test.TestRecovery

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1835//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1835//console

This message is automatically generated.

> Display JVM system properties in AM and task logs
> -
>
> Key: TEZ-3326
> URL: https://issues.apache.org/jira/browse/TEZ-3326
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Eric Badger
> Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch, 
> TEZ-3326.003.patch
>
>
> MapReduce displays JVM system properties via config 
> {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is 
> useful to debug env setting such as java version, etc. It is useful to have 
> such logging in Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-3326 PreCommit Build #1835

2016-07-08 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-3326
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1835/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3920 lines...]
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-tests
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12816828/TEZ-3326.002.patch
  against master revision 5affb3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in :
 org.apache.tez.test.TestRecovery

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1835//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1835//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
729dc53fc44967846607578bfed3fea1bd0ac92b logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Created] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

2016-07-08 Thread Manuel Godbert (JIRA)

Manuel Godbert created TEZ-3330:
---

 Summary: Error on avro M/R job with Tez: missing configuration 
property
 Key: TEZ-3330
 URL: https://issues.apache.org/jira/browse/TEZ-3330
 Project: Apache Tez
  Issue Type: Bug
Reporter: Manuel Godbert


I tried running the simple avro M/R job MapredColorCount, that I found in the 
examples of avro release 1.7.7.
It failed with the following trace:
{code}
errorMessage=Shuffle Runner 
Failed:org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
 Error while doing final merge
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:378)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:337)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at 
org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:133)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:915)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:540)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:376)
... 6 more
{code}

Digging a bit I saw that during shuffle Tez can't access some of the 
configuration properties of the job. In our example it is the 
avro.output.schema that is missing.

With some more complicated code I could get one step further and a similar 
issue happened when the valuesIterator for the reducer was being built:
{code}
java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at org.apache.avro.Schema.parse(Schema.java:966)
at org.apache.avro.mapred.AvroJob.getMapOutputSchema(AvroJob.java:78)
at 
org.apache.avro.mapred.AvroSerialization.getDeserializer(AvroSerialization.java:53)
at 
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:90)
at 
org.apache.tez.runtime.library.common.ValuesIterator.(ValuesIterator.java:80)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createValuesIterator(OrderedGroupedKVInput.java:287)
{code}

I am using HDP2.4, Tez 0.7.0, avro 1.7.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3326) Display JVM system properties in AM and task logs

2016-07-08 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated TEZ-3326:
-
Attachment: TEZ-3326.003.patch

bq. Scope.VERTEX reflects whether this config can be changed on a per 
dag/vertex basis and if yes, it should take effect. In this case, I think AM is 
better as this is mainly for logging only when a container starts up and has no 
impact when a new dag/task is run.

[~hitesh], thanks for the explanation. I had an incorrect understanding of the 
scope annotation. I'm attaching a patch that changes the scope back to AM. 

> Display JVM system properties in AM and task logs
> -
>
> Key: TEZ-3326
> URL: https://issues.apache.org/jira/browse/TEZ-3326
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Eric Badger
> Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch, 
> TEZ-3326.003.patch
>
>
> MapReduce displays JVM system properties via config 
> {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is 
> useful to debug env setting such as java version, etc. It is useful to have 
> such logging in Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs

2016-07-08 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367776#comment-15367776
 ] 

Hitesh Shah commented on TEZ-3326:
--

bq. should the ConfigurationScope be Scope.VERTEX given it applies to both AM 
and tasks?

Scope.VERTEX reflects whether this config can be changed on a per dag/vertex 
basis and if yes, it should take effect. In this case, I think AM is better as 
this is mainly for logging only when a container starts up and has no impact 
when a new dag/task is run. 


> Display JVM system properties in AM and task logs
> -
>
> Key: TEZ-3326
> URL: https://issues.apache.org/jira/browse/TEZ-3326
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Eric Badger
> Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch
>
>
> MapReduce displays JVM system properties via config 
> {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is 
> useful to debug env setting such as java version, etc. It is useful to have 
> such logging in Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-3326) Display JVM system properties in AM and task logs

2016-07-08 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated TEZ-3326:
-
Attachment: TEZ-3326.002.patch

[~mingma]

bq. should the ConfigurationScope be Scope.VERTEX given it applies to both AM 
and tasks?
Yes, good catch 

Thank you for the review. Attaching a patch that addresses your comments. 

> Display JVM system properties in AM and task logs
> -
>
> Key: TEZ-3326
> URL: https://issues.apache.org/jira/browse/TEZ-3326
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Eric Badger
> Attachments: TEZ-3326.001.patch, TEZ-3326.002.patch
>
>
> MapReduce displays JVM system properties via config 
> {{mapreduce.jvm.system-properties-to-log}} in both AM and task log . This is 
> useful to debug env setting such as java version, etc. It is useful to have 
> such logging in Tez.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats

2016-07-08 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367391#comment-15367391
 ] 

Tsuyoshi Ozawa commented on TEZ-3303:
-

Thank you, Ming.

[~sseth] could you also check the patch?

> Have ShuffleVertexManager consume more precise partition stats
> --
>
> Key: TEZ-3303
> URL: https://issues.apache.org/jira/browse/TEZ-3303
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-3303.001.patch, TEZ-3303.002.patch, 
> TEZ-3303.002.patch, TEZ-3303.003.patch
>
>
> TEZ-3216 adds the support for more precise partition stats. 
> ShuffleVertexManager should be updated to consume the more precise partition 
> stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TEZ-1248) Reduce slow-start should special case 1 reducer runs

[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI

[jira] [Commented] (TEZ-3332) Parallelize closing of outputs

[jira] [Updated] (TEZ-1248) Reduce slow-start should special case 1 reducer runs

[jira] [Created] (TEZ-3332) Parallelize closing of outputs

[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI

[jira] [Created] (TEZ-3331) Add operation specific HDFS counters for Tez UI

[jira] [Updated] (TEZ-3209) Support for fair custom data routing

[jira] [Updated] (TEZ-3209) Support for fair custom data routing

[jira] [Updated] (TEZ-3209) Support for fair custom data routing

[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs

Success: TEZ-3326 PreCommit Build #1836

[jira] [Updated] (TEZ-3324) Add kerberos support in ATSImportTool

[jira] [Commented] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs

Failed: TEZ-3326 PreCommit Build #1835

[jira] [Created] (TEZ-3330) Error on avro M/R job with Tez: missing configuration property

[jira] [Updated] (TEZ-3326) Display JVM system properties in AM and task logs

[jira] [Commented] (TEZ-3326) Display JVM system properties in AM and task logs

[jira] [Updated] (TEZ-3326) Display JVM system properties in AM and task logs

[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats

21 matches

Site Navigation

Mail list logo

Footer information