[jira] [Comment Edited] (TEZ-853) Support counters recovery

2014-08-28 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113433#comment-14113433
 ] 

Jeff Zhang edited comment on TEZ-853 at 8/28/14 7:06 AM:
-

[~hitesh]
bq. Do TaskImpl and VertexImpl write counters to recovery but they are not used 
when restoring state? Should the counters be written or recovered from task 
attempts? If the latter, then we should not write them.

There's no counter written from VertexImpl and TaskImpl ( their counters are 
from TaskAttemptImpl ) Should we remove tezCounters from VertexFinishedProto 
and TaskFinishedProto ? since actually we don't use it in recovery 

bq. DAGImpl::restoreFromEvent does not seem to restore counters
DAG do not write any counters, its counters are all from TaskAttemptImpl. So 
that means as long as counters of TaskAttemptImpl is recovered, Counters of DAG 
is recovered.

bq. in a scenario where the dag finished is logged and all other events are 
dropped, I assume counters will be needed?
Yes, you are right.This is a special case. In this case we should write 
counters in DAGFinishedEvent and recover from it. ( will add it )




was (Author: zjffdu):
[~hitesh]
bq. Do TaskImpl and VertexImpl write counters to recovery but they are not used 
when restoring state? Should the counters be written or recovered from task 
attempts? If the latter, then we should not write them.

There's no counter written from VertexImpl and TaskImpl ( their counters are 
from TaskAttemptImpl )

bq. DAGImpl::restoreFromEvent does not seem to restore counters
DAG do not write any counters, its counters are all from TaskAttemptImpl. So 
that means as long as counters of TaskAttemptImpl is recovered, Counters of DAG 
is recovered.

bq. in a scenario where the dag finished is logged and all other events are 
dropped, I assume counters will be needed?
Yes, you are right.This is a special case. In this case we should write 
counters in DAGFinishedEvent and recover from it. ( will add it )



> Support counters recovery
> -
>
> Key: TEZ-853
> URL: https://issues.apache.org/jira/browse/TEZ-853
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: Tez-853.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (TEZ-1357) Display better diagnostics when AM fails to launch

2014-08-28 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang reassigned TEZ-1357:
---

Assignee: Jeff Zhang

> Display better diagnostics when AM fails to launch
> --
>
> Key: TEZ-1357
> URL: https://issues.apache.org/jira/browse/TEZ-1357
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (TEZ-1512) VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present in the vertex

2014-08-28 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-1512:
-

Assignee: Rajesh Balamohan

> VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present 
> in the vertex
> -
>
> Key: TEZ-1512
> URL: https://issues.apache.org/jira/browse/TEZ-1512
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Attachments: TEZ-1512.1.WIP.patch, TEZ-1512.2.patch, 
> large_job_small_tasks.svg, with_patch_large_job_small_tasks.svg
>
>
>  I tried a synthetic benchmark (without much input data) with the tez app.  
> This was tried to understand the bare minimum time taken by Tez for container 
> launch / reuse / scheduling etc.
> Profiling DAGAppMaster showed that lots of CPU time was spent on 
> VertexImpl.getTask(int) which gets accessed as a part of event handling and 
> transitions.  
> This problem would more prevalent in large jobs which has got lots of small 
> tasks.
> I will attach the perf SVG output of the DAG soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-1512) VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present in the vertex

2014-08-28 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved TEZ-1512.
---

   Resolution: Fixed
Fix Version/s: 0.6.0
 Hadoop Flags: Reviewed

Thanks [~sseth].  Committed to master.

commit ddef389a976793da397856f397398bdddc8db123
Author: Rajesh Balamohan 
Date:   Thu Aug 28 13:41:04 2014 +0530


> VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present 
> in the vertex
> -
>
> Key: TEZ-1512
> URL: https://issues.apache.org/jira/browse/TEZ-1512
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Fix For: 0.6.0
>
> Attachments: TEZ-1512.1.WIP.patch, TEZ-1512.2.patch, 
> large_job_small_tasks.svg, with_patch_large_job_small_tasks.svg
>
>
>  I tried a synthetic benchmark (without much input data) with the tez app.  
> This was tried to understand the bare minimum time taken by Tez for container 
> launch / reuse / scheduling etc.
> Profiling DAGAppMaster showed that lots of CPU time was spent on 
> VertexImpl.getTask(int) which gets accessed as a part of event handling and 
> transitions.  
> This problem would more prevalent in large jobs which has got lots of small 
> tasks.
> I will attach the perf SVG output of the DAG soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles

2014-08-28 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1515:
--

Attachment: detailed_sample_stack_trace.txt
HistoryLoggingService.png
DAGAppMaster_AsyncDispatcher.png
RecoveryService.png

> DAGAppMaster : Thread contentions due to 
> org.apache.tez.common.counters.ResourceBundles
> ---
>
> Key: TEZ-1515
> URL: https://issues.apache.org/jira/browse/TEZ-1515
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>  Labels: performance
> Attachments: DAGAppMaster_AsyncDispatcher.png, 
> HistoryLoggingService.png, RecoveryService.png, 
> detailed_sample_stack_trace.txt
>
>
> Thread profiling DagAppMaster for a synthetic tez test revealed lots of 
> contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher 
> threads.  All of these try to access tez counters and are blocked on "public 
> static synchronized  T getValue(String bundleName, String key,String 
> suffix, T defaultValue)".
> I will attach the thread profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles

2014-08-28 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1515:
-

 Summary: DAGAppMaster : Thread contentions due to 
org.apache.tez.common.counters.ResourceBundles
 Key: TEZ-1515
 URL: https://issues.apache.org/jira/browse/TEZ-1515
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: DAGAppMaster_AsyncDispatcher.png, 
HistoryLoggingService.png, RecoveryService.png, detailed_sample_stack_trace.txt

Thread profiling DagAppMaster for a synthetic tez test revealed lots of 
contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher 
threads.  All of these try to access tez counters and are blocked on "public 
static synchronized  T getValue(String bundleName, String key,String suffix, 
T defaultValue)".

I will attach the thread profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1495) ATS integration for TezClient

2014-08-28 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1495:
--

Attachment: TEZ-1495.2.patch

- removed the wrong unit test and added a unit test for getvertex.

> ATS integration for TezClient
> -
>
> Key: TEZ-1495
> URL: https://issues.apache.org/jira/browse/TEZ-1495
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch
>
>
> Tez client should automatically redirect to ATS when the AM is not running.
> All APIs exposed ( DAG status, counters, etc ) from the DAGClient should 
> continue to work after the AM has shut down.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1345) Add checks to guarantee all init events are written to recovery to consider vertex initialized

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114025#comment-14114025
 ] 

Hitesh Shah commented on TEZ-1345:
--

bq. So overall IMO, I prefer to ignore the init events in recovery log and call 
initializer again in recovery. It only affect the performance of recovery while 
the method of adding check in canInitVertex would affect the performance of 
normal run of dag. Hitesh Shah, Bikas Saha What's your thoughts ?

If you do this, this will result in the vertex starting from scratch. Even 
completed tasks will have to be dropped as there is no guarantee that the 
initializer will generate the same events and assign them in the same to the 
tasks. 

> Add checks to guarantee all init events are written to recovery to consider 
> vertex initialized
> --
>
> Key: TEZ-1345
> URL: https://issues.apache.org/jira/browse/TEZ-1345
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: Tez-1345-2.patch, Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1516) Log transfer rate for Broadcast Fetch

2014-08-28 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1516:
---

 Summary: Log transfer rate for Broadcast Fetch
 Key: TEZ-1516
 URL: https://issues.apache.org/jira/browse/TEZ-1516
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1509) Set a useful default value for java opts

2014-08-28 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114049#comment-14114049
 ] 

Bikas Saha commented on TEZ-1509:
-

Can someone review/comment so that this can be committed for 0.5.0. Its in 
incompatible change.

> Set a useful default value for java opts  
> --
>
> Key: TEZ-1509
> URL: https://issues.apache.org/jira/browse/TEZ-1509
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: TEZ-1509.1.patch
>
>
> A subset of the following should be considered for the defaults:
> -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true 
> -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
> -XX:+UseParallelGC



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1501) Add a test dag to generate load on the getTask RPC

2014-08-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114052#comment-14114052
 ] 

Gopal V commented on TEZ-1501:
--

Looks good - +1

Just needs an fs.deleteOnExit() for the PAYLOAD file for cleanups.

> Add a test dag to generate load on the getTask RPC
> --
>
> Key: TEZ-1501
> URL: https://issues.apache.org/jira/browse/TEZ-1501
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-1501.1.txt, TEZ-1501.2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reopened TEZ-1510:
--


> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned TEZ-1510:


Assignee: Hitesh Shah

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1345) Add checks to guarantee all init events are written to recovery to consider vertex initialized

2014-08-28 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114081#comment-14114081
 ] 

Bikas Saha commented on TEZ-1345:
-

There are 2 alternatives 
1) pessimistic - save events before starting. This delays performance. This 
patch is not really achieving that. 
2) optimistic - save events while starting. The only case where this wont work 
if when the AM crashes immediately after.
In both cases, for now, the contract for init events is that they must be made 
up-front. So its a 1 time thing. When that changes, there will need to be an 
additional mechanism to notify the framework that initing is dont. And in fact 
it may not be done till the last block of data gets assigned to an owner till 
the very end of execution. How recovery is going to work in these cases is 
still not clear though the optimistic approach still works where it works.
IMO the performance loss is probably not going to acceptable for short queries.
What we could do is add an API that allows the VertexManager to notify the 
framework that it is done making updates. It could also pass along a state 
payload that represents its state in case we need to restart it. That 
notification could be saved in the log. If that notification is present during 
recovery then we can continue to recover from where we left off and also 
provide state to the VM. If that notification is not present in recovery then 
we start from scratch. IMO, in 99% of the cases this should be enough. The 
contract for VMs then clearly becomes, recovery works post DONE notification.

> Add checks to guarantee all init events are written to recovery to consider 
> vertex initialized
> --
>
> Key: TEZ-1345
> URL: https://issues.apache.org/jira/browse/TEZ-1345
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: Tez-1345-2.patch, Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1501) Add a test dag to generate load on the getTask RPC

2014-08-28 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1501:


Attachment: TEZ-1501.3.txt

Updated patch. Committing. Thanks for the review.

> Add a test dag to generate load on the getTask RPC
> --
>
> Key: TEZ-1501
> URL: https://issues.apache.org/jira/browse/TEZ-1501
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.6.0
>
> Attachments: TEZ-1501.1.txt, TEZ-1501.2.txt, TEZ-1501.3.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-1501) Add a test dag to generate load on the getTask RPC

2014-08-28 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-1501.
-

   Resolution: Fixed
Fix Version/s: 0.6.0

Committed to master.

> Add a test dag to generate load on the getTask RPC
> --
>
> Key: TEZ-1501
> URL: https://issues.apache.org/jira/browse/TEZ-1501
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.6.0
>
> Attachments: TEZ-1501.1.txt, TEZ-1501.2.txt, TEZ-1501.3.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1510:
-

Attachment: TEZ-1510.1.patch

[~sseth] review please. 

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1510.1.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114123#comment-14114123
 ] 

Bikas Saha commented on TEZ-1510:
-

Does the test fail without the changes?

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1510.1.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114131#comment-14114131
 ] 

Hitesh Shah commented on TEZ-1511:
--

The use of NEW_API_CONFIG seems incorrect. There should be 2 such fields - one 
for mapper and one for the reducer - as there seems to be a mix of both 
mapper.new-api and reducer.new-api being used ( though not sure if that is 
intended or a bug ).

For future reference, Configuration::getClassByName seems a better 
implementation than ReflectionUtils::getClazz. 


> MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is 
> not provided
> ---
>
> Key: TEZ-1511
> URL: https://issues.apache.org/jira/browse/TEZ-1511
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch
>
>
> Code uses: 
> {code}
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR));
> } else {
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get("mapred.output.format.class"));
> {code}
> where ReflectionUtils has :
> {code}
>  Class getClass(T o)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1280) Timeline server integration for DAG history

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1280:
-

Assignee: Prakash Ramachandran

> Timeline server integration for DAG history
> ---
>
> Key: TEZ-1280
> URL: https://issues.apache.org/jira/browse/TEZ-1280
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Prakash Ramachandran
>Priority: Critical
>
> Umbrella jira to detail out all tasks to complete integration of Tez Client 
> and DAG for history.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided

2014-08-28 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114148#comment-14114148
 ] 

Bikas Saha commented on TEZ-1511:
-

bq. The use of NEW_API_CONFIG seems incorrect. 
Not sure what you mean here. The field is different for the Input and the 
Output. The Input always has mapper and the Output always has reducer. The 
patch removes a bug in the Output where is was looking at mapper. The private 
constant prevents such bugs in the future.

I can change to use Configuration::getClassByName instead.

> MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is 
> not provided
> ---
>
> Key: TEZ-1511
> URL: https://issues.apache.org/jira/browse/TEZ-1511
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch
>
>
> Code uses: 
> {code}
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR));
> } else {
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get("mapred.output.format.class"));
> {code}
> where ReflectionUtils has :
> {code}
>  Class getClass(T o)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1516) Log transfer rate for Broadcast Fetch

2014-08-28 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1516:


Attachment: TEZ-1516.1.txt

Simple patch to log transfer times for individual fetches, as well as an 
average.
[~gopalv] - please review.

> Log transfer rate for Broadcast Fetch
> -
>
> Key: TEZ-1516
> URL: https://issues.apache.org/jira/browse/TEZ-1516
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-1516.1.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1504) Ordered Input Shuffle can hang if there's errors while creating the Fetcher

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1504:
-

Priority: Critical  (was: Major)

> Ordered Input Shuffle can hang if there's errors while creating the Fetcher
> ---
>
> Key: TEZ-1504
> URL: https://issues.apache.org/jira/browse/TEZ-1504
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Priority: Critical
>
> As an example, a missing codec will cause the Fetcher to throw an exception - 
> which causes the tracking thread to die.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher

2014-08-28 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1517:
---

 Summary: Avoid sending routed events via the AsyncDispatcher
 Key: TEZ-1517
 URL: https://issues.apache.org/jira/browse/TEZ-1517
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical


Sending them via the queue ends up creating lots of unnecesaary objects 
(millions for a large job), as well as blocking the queue.

Eventually, event routing should be handed over to a separate thread - so that 
the asyncdispatcher is unblocked to continue operations like launching tasks, 
etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1510:
-

Attachment: TEZ-1510.2.patch

Modified test to compile without patch. 

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher

2014-08-28 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1517:


Attachment: TEZ-1517.1.txt

Simple patch, to send events directly to tasks.

[~bikassaha], [~hitesh] - review please.

> Avoid sending routed events via the AsyncDispatcher
> ---
>
> Key: TEZ-1517
> URL: https://issues.apache.org/jira/browse/TEZ-1517
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: TEZ-1517.1.txt
>
>
> Sending them via the queue ends up creating lots of unnecesaary objects 
> (millions for a large job), as well as blocking the queue.
> Eventually, event routing should be handed over to a separate thread - so 
> that the asyncdispatcher is unblocked to continue operations like launching 
> tasks, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114269#comment-14114269
 ] 

Siddharth Seth commented on TEZ-1510:
-

+1. Looks good. It's a little risky, since we don't know if there was some 
specific place where we're inadvertently relying on tez-site being part of the 
Configuration because it was a default resource. Most of the runtime bits 
should be fine, since they work off of a payload sent from the client side.

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1310) Update website documentation framework

2014-08-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114270#comment-14114270
 ] 

Jonathan Eagles commented on TEZ-1310:
--

One thing I forgot to mention is that the markdown back-end for 
maven-site-plugin (doxia-markdown-plugin) is pegdown 
https://github.com/sirthias/pegdown and it doesn't support roman numeral html 
lists.

> Update website documentation framework
> --
>
> Key: TEZ-1310
> URL: https://issues.apache.org/jira/browse/TEZ-1310
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jonathan Eagles
> Attachments: TEZ-1310-v1.patch, TEZ-1310-v2.patch
>
>
> A better option for docs would be to use markdown format. Also, it might be 
> worth investigating moving to cms instead of svnpubsub. 
> https://www.apache.org/dev/project-site.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1510:
-

Attachment: TEZ-1510.3.patch

Minor tweak to test. Committing shortly.

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, TEZ-1510.3.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided

2014-08-28 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1511:


Attachment: TEZ-1511.3.patch

Made the changes. Changed a test to use this version of the API.

> MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is 
> not provided
> ---
>
> Key: TEZ-1511
> URL: https://issues.apache.org/jira/browse/TEZ-1511
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch, TEZ-1511.3.patch
>
>
> Code uses: 
> {code}
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR));
> } else {
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get("mapred.output.format.class"));
> {code}
> where ReflectionUtils has :
> {code}
>  Class getClass(T o)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode

2014-08-28 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated TEZ-1376:
-

Summary: Investigate independent parallel DAGs execution in Local Mode  
(was: Support independent parallel DAGs execution in Local Mode)

> Investigate independent parallel DAGs execution in Local Mode
> -
>
> Key: TEZ-1376
> URL: https://issues.apache.org/jira/browse/TEZ-1376
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.4.1
>Reporter: Chen He
> Attachments: differentParallelJob-ErrorOnTerminal.txt, 
> differentParallelJob-surefireOut.txt, differentParallelJob.patch, 
> simpleTestCase.patch
>
>
> Pig on Tez allows user to submit parallel DAGs in a single pig script. Those 
> DAGs could be independent and concurrent. Current LocalMode may encounter 
> some problems when concurrent parallel DAGs are submitted.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode

2014-08-28 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He reassigned TEZ-1376:


Assignee: Chen He

> Investigate independent parallel DAGs execution in Local Mode
> -
>
> Key: TEZ-1376
> URL: https://issues.apache.org/jira/browse/TEZ-1376
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.4.1
>Reporter: Chen He
>Assignee: Chen He
> Attachments: differentParallelJob-ErrorOnTerminal.txt, 
> differentParallelJob-surefireOut.txt, differentParallelJob.patch, 
> simpleTestCase.patch
>
>
> Pig on Tez allows user to submit parallel DAGs in a single pig script. Those 
> DAGs could be independent and concurrent. Current LocalMode may encounter 
> some problems when concurrent parallel DAGs are submitted.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1518) Clean up ID caches on DAG completion

2014-08-28 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1518:
---

 Summary: Clean up ID caches on DAG completion
 Key: TEZ-1518
 URL: https://issues.apache.org/jira/browse/TEZ-1518
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1508) Log a warning if Xmx is configured incorrectly.

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114346#comment-14114346
 ] 

Siddharth Seth commented on TEZ-1508:
-

Should we just be failing if this is configured incorrectly.

> Log a warning if Xmx is configured incorrectly. 
> 
>
> Key: TEZ-1508
> URL: https://issues.apache.org/jira/browse/TEZ-1508
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jonathan Eagles
>  Labels: newbie
>
> Users may incorrectly configure Xmx for tasks. Xmx should in most cases be 
> less than the container/task size given that YARN will kill containers that 
> exceed the memory limits. 
> Given that we already parse the java opts to detect Xmx in the java opts, it 
> should be trivial to add a check if the value is valid and log a warning if 
> not ( for both the Tez AM and the vertices in a DAG).  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1499) Add OrderedJoinExample to tez-examples

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114345#comment-14114345
 ] 

Siddharth Seth commented on TEZ-1499:
-

Joins are common operation, for which separate examples which highlight 
different kinds would be useful.
An alternate example could be used to demonstrate the composable nature - 
wordcount for example, which can be implemented with an unsorted edge.

> Add OrderedJoinExample to tez-examples
> --
>
> Key: TEZ-1499
> URL: https://issues.apache.org/jira/browse/TEZ-1499
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>
> In the current join example, the inputs of JoinProcessor is unordered so that 
> it will always need to load one input into memory, and stream another input. 
> This only fit for the case when one dataset is small enough to fit into 
> memory ( even use no-broadcast, memory may not be enough ).  So I'd like to 
> add another join example that make the inputs of JoinProcessor is ordered. ( 
> using OrderedPartitionedKVEdgeConfig ). This kind of join could been used 
> when both of the 2 datasets are large.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode

2014-08-28 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated TEZ-1376:
-

Attachment: (was: differentParallelJob-ErrorOnTerminal.txt)

> Investigate independent parallel DAGs execution in Local Mode
> -
>
> Key: TEZ-1376
> URL: https://issues.apache.org/jira/browse/TEZ-1376
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.4.1
>Reporter: Chen He
>Assignee: Chen He
> Attachments: differentParallelJob-surefireOut.txt, 
> differentParallelJob.patch, simpleTestCase.patch
>
>
> Pig on Tez allows user to submit parallel DAGs in a single pig script. Those 
> DAGs could be independent and concurrent. Current LocalMode may encounter 
> some problems when concurrent parallel DAGs are submitted.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1510:
-

Attachment: TEZ-1510.3.addendum.patch

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, 
> TEZ-1510.3.addendum.patch, TEZ-1510.3.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114361#comment-14114361
 ] 

Siddharth Seth commented on TEZ-1511:
-

"mapred.input.format.class" - would be good to have this as a constant as well, 
similar to the new API constant. Likewise for the OutputFormat.
Why does the UnionExample need to change ?

A simple test to validate the correct path would be useful.

Rest looks good.

> MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is 
> not provided
> ---
>
> Key: TEZ-1511
> URL: https://issues.apache.org/jira/browse/TEZ-1511
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch, TEZ-1511.3.patch
>
>
> Code uses: 
> {code}
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR));
> } else {
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get("mapred.output.format.class"));
> {code}
> where ReflectionUtils has :
> {code}
>  Class getClass(T o)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1508) Log a warning if Xmx is configured incorrectly.

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114363#comment-14114363
 ] 

Hitesh Shah commented on TEZ-1508:
--

No guarantees on whether this is intentionally done by a user on a cluster with 
memory-monitoring disabled. 

> Log a warning if Xmx is configured incorrectly. 
> 
>
> Key: TEZ-1508
> URL: https://issues.apache.org/jira/browse/TEZ-1508
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jonathan Eagles
>  Labels: newbie
>
> Users may incorrectly configure Xmx for tasks. Xmx should in most cases be 
> less than the container/task size given that YARN will kill containers that 
> exceed the memory limits. 
> Given that we already parse the java opts to detect Xmx in the java opts, it 
> should be trivial to add a check if the value is valid and log a warning if 
> not ( for both the Tez AM and the vertices in a DAG).  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1508) Log a warning if Xmx is configured incorrectly.

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114363#comment-14114363
 ] 

Hitesh Shah edited comment on TEZ-1508 at 8/28/14 9:19 PM:
---

No guarantees on whether this is intentionally done by a user on a cluster with 
memory-monitoring disabled. Also, a high Xmx need not imply the process will 
use that much amount of memory. 


was (Author: hitesh):
No guarantees on whether this is intentionally done by a user on a cluster with 
memory-monitoring disabled. 

> Log a warning if Xmx is configured incorrectly. 
> 
>
> Key: TEZ-1508
> URL: https://issues.apache.org/jira/browse/TEZ-1508
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jonathan Eagles
>  Labels: newbie
>
> Users may incorrectly configure Xmx for tasks. Xmx should in most cases be 
> less than the container/task size given that YARN will kill containers that 
> exceed the memory limits. 
> Given that we already parse the java opts to detect Xmx in the java opts, it 
> should be trivial to add a check if the value is valid and log a warning if 
> not ( for both the Tez AM and the vertices in a DAG).  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1509) Set a useful default value for java opts

2014-08-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114371#comment-14114371
 ] 

Gopal V commented on TEZ-1509:
--

+1 on the new options.

> Set a useful default value for java opts  
> --
>
> Key: TEZ-1509
> URL: https://issues.apache.org/jira/browse/TEZ-1509
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: TEZ-1509.1.patch
>
>
> A subset of the following should be considered for the defaults:
> -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true 
> -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
> -XX:+UseParallelGC



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode

2014-08-28 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated TEZ-1376:
-

Attachment: (was: differentParallelJob.patch)

> Investigate independent parallel DAGs execution in Local Mode
> -
>
> Key: TEZ-1376
> URL: https://issues.apache.org/jira/browse/TEZ-1376
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.4.1
>Reporter: Chen He
>Assignee: Chen He
> Attachments: simpleTestCase.patch
>
>
> Pig on Tez allows user to submit parallel DAGs in a single pig script. Those 
> DAGs could be independent and concurrent. Current LocalMode may encounter 
> some problems when concurrent parallel DAGs are submitted.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode

2014-08-28 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated TEZ-1376:
-

Attachment: (was: differentParallelJob-surefireOut.txt)

> Investigate independent parallel DAGs execution in Local Mode
> -
>
> Key: TEZ-1376
> URL: https://issues.apache.org/jira/browse/TEZ-1376
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.4.1
>Reporter: Chen He
>Assignee: Chen He
> Attachments: simpleTestCase.patch
>
>
> Pig on Tez allows user to submit parallel DAGs in a single pig script. Those 
> DAGs could be independent and concurrent. Current LocalMode may encounter 
> some problems when concurrent parallel DAGs are submitted.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1495) ATS integration for TezClient

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114393#comment-14114393
 ] 

Hitesh Shah commented on TEZ-1495:
--

Overall comments:
   - please look at TimelineClientImpl in yarn code for supporting secure 
clusters
   - At this point, it looks like there is no timeline client dependency so 
this should be ok to be in tez-api.
   - Regarding the current switch between AM and ATS, I think it should 
probably be a one-time switch. Once the client switches over to ATS, it should 
stick to ATS. This implies knowing when the final switchover needs to happen 
i.e on AM completion or when the session switches to a different DAG. You may 
wish to look at TestOrderedWordCount and modify it to test whether DAGClient 
for the client for the first dag works as intended while the second dag is 
running. 

Regarding

{code}
+// Status of the DAG is updated only when it completes. default to RUNNING 
if no status found
+// as the ATS is not updated until the status of dag is running.
{code}
   - maybe the DAG status should be updated to running whenever the dag started 
event is logged? 



> ATS integration for TezClient
> -
>
> Key: TEZ-1495
> URL: https://issues.apache.org/jira/browse/TEZ-1495
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch
>
>
> Tez client should automatically redirect to ATS when the AM is not running.
> All APIs exposed ( DAG status, counters, etc ) from the DAGClient should 
> continue to work after the AM has shut down.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode

2014-08-28 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated TEZ-1376:
-

Attachment: independantDAG2.data
independantDAG1.data
independantDAG.pig

a pig example with input files to test TEZ Local Mode

> Investigate independent parallel DAGs execution in Local Mode
> -
>
> Key: TEZ-1376
> URL: https://issues.apache.org/jira/browse/TEZ-1376
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.4.1
>Reporter: Chen He
>Assignee: Chen He
> Attachments: independantDAG.pig, independantDAG1.data, 
> independantDAG2.data, simpleTestCase.patch
>
>
> Pig on Tez allows user to submit parallel DAGs in a single pig script. Those 
> DAGs could be independent and concurrent. Current LocalMode may encounter 
> some problems when concurrent parallel DAGs are submitted.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode

2014-08-28 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114392#comment-14114392
 ] 

Chen He edited comment on TEZ-1376 at 8/28/14 9:37 PM:
---

a pig example with input files to test TEZ Local Mode running independent DAGs.


was (Author: airbots):
a pig example with input files to test TEZ Local Mode

> Investigate independent parallel DAGs execution in Local Mode
> -
>
> Key: TEZ-1376
> URL: https://issues.apache.org/jira/browse/TEZ-1376
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.4.1
>Reporter: Chen He
>Assignee: Chen He
> Attachments: independantDAG.pig, independantDAG1.data, 
> independantDAG2.data, simpleTestCase.patch
>
>
> Pig on Tez allows user to submit parallel DAGs in a single pig script. Those 
> DAGs could be independent and concurrent. Current LocalMode may encounter 
> some problems when concurrent parallel DAGs are submitted.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode

2014-08-28 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated TEZ-1376:
-

Comment: was deleted

(was: The sequential number looks incorrect since LocalClient use 
"appIdNumber++" to create the incremental sequential number "0001". This is not 
a critical issue. The important is that I create both session and non-session 
parallel different job test. Both of them report 
"org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez 
jars, tez.lib.uris is not defined in the configurartion". I am investigating 
the reason.)

> Investigate independent parallel DAGs execution in Local Mode
> -
>
> Key: TEZ-1376
> URL: https://issues.apache.org/jira/browse/TEZ-1376
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.4.1
>Reporter: Chen He
>Assignee: Chen He
> Attachments: independantDAG.pig, independantDAG1.data, 
> independantDAG2.data, simpleTestCase.patch
>
>
> Pig on Tez allows user to submit parallel DAGs in a single pig script. Those 
> DAGs could be independent and concurrent. Current LocalMode may encounter 
> some problems when concurrent parallel DAGs are submitted.   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1510:
-

Attachment: TEZ-1510.3.missing-file.patch

Add missing file needed for test

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, 
> TEZ-1510.3.addendum.patch, TEZ-1510.3.missing-file.patch, TEZ-1510.3.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1488) Implement HashComparator in TezBytesComparator

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114428#comment-14114428
 ] 

Siddharth Seth commented on TEZ-1488:
-

+1. Looks good.
Some documentation on the HashComparator interface will be useful on what it is 
used for. Also is getHashCode the correct name - something like 
extractKeyMSBytes may be more appropriate.

> Implement HashComparator in TezBytesComparator
> -
>
> Key: TEZ-1488
> URL: https://issues.apache.org/jira/browse/TEZ-1488
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: TEZ-1488.1.patch
>
>
> Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
> This moves part of the key comparator into the partition comparator, which is 
> a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1488) Implement HashComparator in TezBytesComparator

2014-08-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114448#comment-14114448
 ] 

Gopal V commented on TEZ-1488:
--

I think we should call the interface what it really - a ProxyComparator?

I will do the renames & write docs.

> Implement HashComparator in TezBytesComparator
> -
>
> Key: TEZ-1488
> URL: https://issues.apache.org/jira/browse/TEZ-1488
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: TEZ-1488.1.patch
>
>
> Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
> This moves part of the key comparator into the partition comparator, which is 
> a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114460#comment-14114460
 ] 

Hitesh Shah commented on TEZ-1517:
--

Seems reasonable. I am assuming that this is after the events pass VertexImpl 
routing hence recovery will not be affected. 

The only problem is the change is not thread-safe - list is a simple array list 
and function is not synchronized/locked as needed. 

> Avoid sending routed events via the AsyncDispatcher
> ---
>
> Key: TEZ-1517
> URL: https://issues.apache.org/jira/browse/TEZ-1517
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: TEZ-1517.1.txt
>
>
> Sending them via the queue ends up creating lots of unnecesaary objects 
> (millions for a large job), as well as blocking the queue.
> Eventually, event routing should be handed over to a separate thread - so 
> that the asyncdispatcher is unblocked to continue operations like launching 
> tasks, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114486#comment-14114486
 ] 

Siddharth Seth commented on TEZ-1515:
-

[~rajesh.balamohan] - should we just remove support for resource bundles and 
localization. This would remove display names as well - which can be added back 
with a simpler mechanism.

> DAGAppMaster : Thread contentions due to 
> org.apache.tez.common.counters.ResourceBundles
> ---
>
> Key: TEZ-1515
> URL: https://issues.apache.org/jira/browse/TEZ-1515
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>  Labels: performance
> Attachments: DAGAppMaster_AsyncDispatcher.png, 
> HistoryLoggingService.png, RecoveryService.png, 
> detailed_sample_stack_trace.txt
>
>
> Thread profiling DagAppMaster for a synthetic tez test revealed lots of 
> contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher 
> threads.  All of these try to access tez counters and are blocked on "public 
> static synchronized  T getValue(String bundleName, String key,String 
> suffix, T defaultValue)".
> I will attach the thread profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1488) Implement HashComparator in TezBytesComparator

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114489#comment-14114489
 ] 

Siddharth Seth commented on TEZ-1488:
-

ComparatorPrefixGenerator ?

> Implement HashComparator in TezBytesComparator
> -
>
> Key: TEZ-1488
> URL: https://issues.apache.org/jira/browse/TEZ-1488
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: TEZ-1488.1.patch
>
>
> Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
> This moves part of the key comparator into the partition comparator, which is 
> a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114503#comment-14114503
 ] 

Siddharth Seth commented on TEZ-1517:
-

It's still running on the EventHandler thread - just not via separate events. 
So the usage continues to be the same as it was.

I haven't added a separate thread here, will be creating a separate jira for 
that and will address synchronization at that point.

> Avoid sending routed events via the AsyncDispatcher
> ---
>
> Key: TEZ-1517
> URL: https://issues.apache.org/jira/browse/TEZ-1517
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: TEZ-1517.1.txt
>
>
> Sending them via the queue ends up creating lots of unnecesaary objects 
> (millions for a large job), as well as blocking the queue.
> Eventually, event routing should be handed over to a separate thread - so 
> that the asyncdispatcher is unblocked to continue operations like launching 
> tasks, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1509) Set a useful default value for java opts

2014-08-28 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114527#comment-14114527
 ] 

Tsuyoshi OZAWA commented on TEZ-1509:
-

The new option looks good to me.

> Set a useful default value for java opts  
> --
>
> Key: TEZ-1509
> URL: https://issues.apache.org/jira/browse/TEZ-1509
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Attachments: TEZ-1509.1.patch
>
>
> A subset of the following should be considered for the defaults:
> -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true 
> -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
> -XX:+UseParallelGC



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher

2014-08-28 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1517:


Attachment: TEZ-1517.2.txt

This does require minimal locking even here - which was earlier being done by 
the handleEvent invocation.
Yes, recovery is not impacted - this is after the initial events come in.

Committing in a bit, thanks for the review.

> Avoid sending routed events via the AsyncDispatcher
> ---
>
> Key: TEZ-1517
> URL: https://issues.apache.org/jira/browse/TEZ-1517
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: TEZ-1517.1.txt, TEZ-1517.2.txt
>
>
> Sending them via the queue ends up creating lots of unnecesaary objects 
> (millions for a large job), as well as blocking the queue.
> Eventually, event routing should be handed over to a separate thread - so 
> that the asyncdispatcher is unblocked to continue operations like launching 
> tasks, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1519) TezTaskRunner should not initialize TezConfiguration in TezChild

2014-08-28 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1519:


 Summary: TezTaskRunner should not initialize TezConfiguration in 
TezChild
 Key: TEZ-1519
 URL: https://issues.apache.org/jira/browse/TEZ-1519
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Blocker


Should be doing a new Configuration and augmenting with the config data from 
tez-conf.pb. 

Need confirmation on tez-conf.pb being localized for all containers. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles

2014-08-28 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1515:
--

Attachment: TEZ-1515.1.patch

Attachment removes the resource bundle/localization usage. [~sseth] Can you 
please review?

> DAGAppMaster : Thread contentions due to 
> org.apache.tez.common.counters.ResourceBundles
> ---
>
> Key: TEZ-1515
> URL: https://issues.apache.org/jira/browse/TEZ-1515
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>  Labels: performance
> Attachments: DAGAppMaster_AsyncDispatcher.png, 
> HistoryLoggingService.png, RecoveryService.png, TEZ-1515.1.patch, 
> detailed_sample_stack_trace.txt
>
>
> Thread profiling DagAppMaster for a synthetic tez test revealed lots of 
> contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher 
> threads.  All of these try to access tez counters and are blocked on "public 
> static synchronized  T getValue(String bundleName, String key,String 
> suffix, T defaultValue)".
> I will attach the thread profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114597#comment-14114597
 ] 

Siddharth Seth commented on TEZ-1515:
-

Looks good to me. Should remove the ResourceBundles class as well if it isn't 
used.

> DAGAppMaster : Thread contentions due to 
> org.apache.tez.common.counters.ResourceBundles
> ---
>
> Key: TEZ-1515
> URL: https://issues.apache.org/jira/browse/TEZ-1515
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>  Labels: performance
> Attachments: DAGAppMaster_AsyncDispatcher.png, 
> HistoryLoggingService.png, RecoveryService.png, TEZ-1515.1.patch, 
> detailed_sample_stack_trace.txt
>
>
> Thread profiling DagAppMaster for a synthetic tez test revealed lots of 
> contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher 
> threads.  All of these try to access tez counters and are blocked on "public 
> static synchronized  T getValue(String bundleName, String key,String 
> suffix, T defaultValue)".
> I will attach the thread profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114598#comment-14114598
 ] 

Siddharth Seth commented on TEZ-1515:
-

And open a follow up jira to support displayNames better.

> DAGAppMaster : Thread contentions due to 
> org.apache.tez.common.counters.ResourceBundles
> ---
>
> Key: TEZ-1515
> URL: https://issues.apache.org/jira/browse/TEZ-1515
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>  Labels: performance
> Attachments: DAGAppMaster_AsyncDispatcher.png, 
> HistoryLoggingService.png, RecoveryService.png, TEZ-1515.1.patch, 
> detailed_sample_stack_trace.txt
>
>
> Thread profiling DagAppMaster for a synthetic tez test revealed lots of 
> contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher 
> threads.  All of these try to access tez counters and are blocked on "public 
> static synchronized  T getValue(String bundleName, String key,String 
> suffix, T defaultValue)".
> I will attach the thread profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles

2014-08-28 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1515:
--

Attachment: TEZ-1515.2.patch

Removed ResourceBundles in attached patch.

> DAGAppMaster : Thread contentions due to 
> org.apache.tez.common.counters.ResourceBundles
> ---
>
> Key: TEZ-1515
> URL: https://issues.apache.org/jira/browse/TEZ-1515
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>  Labels: performance
> Attachments: DAGAppMaster_AsyncDispatcher.png, 
> HistoryLoggingService.png, RecoveryService.png, TEZ-1515.1.patch, 
> TEZ-1515.2.patch, detailed_sample_stack_trace.txt
>
>
> Thread profiling DagAppMaster for a synthetic tez test revealed lots of 
> contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher 
> threads.  All of these try to access tez counters and are blocked on "public 
> static synchronized  T getValue(String bundleName, String key,String 
> suffix, T defaultValue)".
> I will attach the thread profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1520) TezCounters: support displayNames better

2014-08-28 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1520:
-

 Summary: TezCounters: support displayNames better
 Key: TEZ-1520
 URL: https://issues.apache.org/jira/browse/TEZ-1520
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1520) TezCounters: support better displayNames

2014-08-28 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1520:
--

Summary: TezCounters: support better displayNames  (was: TezCounters: 
support displayNames better)

> TezCounters: support better displayNames
> 
>
> Key: TEZ-1520
> URL: https://issues.apache.org/jira/browse/TEZ-1520
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1495) ATS integration for TezClient

2014-08-28 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114678#comment-14114678
 ] 

Prakash Ramachandran commented on TEZ-1495:
---

[~hitesh]
bq. Regarding the current switch between AM and ATS, I think it should probably 
be a one-time switch.
can switch to ATS on AM completion. however if it also handles the case of AM 
relaunch (i.e switched to ATS when AM was not reachable) and the event for AM 
completion (from new AM after it completes) for some reason does not reach ATS 
wont this cause a indefinite wait?

> ATS integration for TezClient
> -
>
> Key: TEZ-1495
> URL: https://issues.apache.org/jira/browse/TEZ-1495
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch
>
>
> Tez client should automatically redirect to ATS when the AM is not running.
> All APIs exposed ( DAG status, counters, etc ) from the DAGClient should 
> continue to work after the AM has shut down.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1488) Implement HashComparator in TezBytesComparator

2014-08-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114746#comment-14114746
 ] 

Gopal V commented on TEZ-1488:
--

This was originally an interface for BinaryComparable called PrefixComparable 
etc.

http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-issues/201211.mbox/%3C1195493022.84126.1352331614565.JavaMail.jiratomcat@arcas%3E

I'd like to retain those roots with a ProxyComparator because what it returns 
is really a proxy for key comparisons.

That is it implies that

getProxy(k1) < getProxy(k2) ==> k1 < k2 

but 

getProxy(k1) == getProxy(k2) ==> k1 ?? k2 (no relation)

> Implement HashComparator in TezBytesComparator
> -
>
> Key: TEZ-1488
> URL: https://issues.apache.org/jira/browse/TEZ-1488
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: TEZ-1488.1.patch
>
>
> Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
> This moves part of the key comparator into the partition comparator, which is 
> a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114771#comment-14114771
 ] 

Bikas Saha commented on TEZ-1510:
-

Please comment on whether this is safe for 0.5.0 or not.

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, 
> TEZ-1510.3.addendum.patch, TEZ-1510.3.missing-file.patch, TEZ-1510.3.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided

2014-08-28 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114773#comment-14114773
 ] 

Bikas Saha commented on TEZ-1511:
-

UnionExample is run today as an e2e test for the build. So this captures e2e 
regression from a Hive/Pig usage point of view.
The Precondition check is already acting as a built in self test.

> MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is 
> not provided
> ---
>
> Key: TEZ-1511
> URL: https://issues.apache.org/jira/browse/TEZ-1511
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch, TEZ-1511.3.patch
>
>
> Code uses: 
> {code}
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR));
> } else {
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get("mapred.output.format.class"));
> {code}
> where ReflectionUtils has :
> {code}
>  Class getClass(T o)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided

2014-08-28 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1511:


Attachment: TEZ-1511.4.patch

Attaching patch that creates configs for mapper/reduce.new-api in MRJobConfig 
instead of the private constants. These are still private to Tez overall.

> MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is 
> not provided
> ---
>
> Key: TEZ-1511
> URL: https://issues.apache.org/jira/browse/TEZ-1511
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
>Priority: Blocker
> Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch, TEZ-1511.3.patch, 
> TEZ-1511.4.patch
>
>
> Code uses: 
> {code}
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR));
> } else {
>   this.outputFormat = 
> ReflectionUtils.getClass(conf.get("mapred.output.format.class"));
> {code}
> where ReflectionUtils has :
> {code}
>  Class getClass(T o)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1495) ATS integration for TezClient

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114849#comment-14114849
 ] 

Hitesh Shah commented on TEZ-1495:
--

bq. can switch to ATS on AM completion. however if it also handles the case of 
AM relaunch
Sorry should have clarified. it should be a one-time switch on application 
completion ( i.e. after all AM attempts finish ). 

Based on the current implementation, it is switching when the AM process goes 
down i.e it would switch to ATS for a temporary period but then switch back to 
the next AM attempt.  However, at this point, it would also need to monitor the 
application report from YARN to check whether the application has completed or 
not.
 
bq. the event for AM completion (from new AM after it completes) for some 
reason does not reach ATS wont this cause a indefinite wait
Could you shed more clarity on this. The ATS data need not be definitive though 
polling the final application state from the RM would be enough to 
short-circuit the wait loop.


> ATS integration for TezClient
> -
>
> Key: TEZ-1495
> URL: https://issues.apache.org/jira/browse/TEZ-1495
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch
>
>
> Tez client should automatically redirect to ATS when the AM is not running.
> All APIs exposed ( DAG status, counters, etc ) from the DAGClient should 
> continue to work after the AM has shut down.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1495) ATS integration for TezClient

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114849#comment-14114849
 ] 

Hitesh Shah edited comment on TEZ-1495 at 8/29/14 4:42 AM:
---

bq. can switch to ATS on AM completion. however if it also handles the case of 
AM relaunch
Sorry should have clarified. it should be a one-time switch on application 
completion ( i.e. after all AM attempts finish ). 

Based on the current implementation, it is switching when the AM process goes 
down i.e it would switch to ATS for a temporary period but then switch back to 
the next AM attempt.  However, at this point, it would also need to monitor the 
application report from YARN to check whether the application has completed or 
not.
 
bq. the event for AM completion (from new AM after it completes) for some 
reason does not reach ATS wont this cause a indefinite wait
Could you shed more clarity on this. The ATS data need not be definitive though 
polling the final application state from the RM would be enough to 
short-circuit the wait loop ( with some level of waiting to ensure that any 
delay in propagating data from AM to ATS is accounted for ).



was (Author: hitesh):
bq. can switch to ATS on AM completion. however if it also handles the case of 
AM relaunch
Sorry should have clarified. it should be a one-time switch on application 
completion ( i.e. after all AM attempts finish ). 

Based on the current implementation, it is switching when the AM process goes 
down i.e it would switch to ATS for a temporary period but then switch back to 
the next AM attempt.  However, at this point, it would also need to monitor the 
application report from YARN to check whether the application has completed or 
not.
 
bq. the event for AM completion (from new AM after it completes) for some 
reason does not reach ATS wont this cause a indefinite wait
Could you shed more clarity on this. The ATS data need not be definitive though 
polling the final application state from the RM would be enough to 
short-circuit the wait loop.


> ATS integration for TezClient
> -
>
> Key: TEZ-1495
> URL: https://issues.apache.org/jira/browse/TEZ-1495
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch
>
>
> Tez client should automatically redirect to ATS when the AM is not running.
> All APIs exposed ( DAG status, counters, etc ) from the DAGClient should 
> continue to work after the AM has shut down.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.

2014-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114851#comment-14114851
 ] 

Hitesh Shah commented on TEZ-1510:
--

Ran a few example jobs and did not see any issues after this patch.

> TezConfiguration should not add tez-site.xml as a default resource. 
> 
>
> Key: TEZ-1510
> URL: https://issues.apache.org/jira/browse/TEZ-1510
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, 
> TEZ-1510.3.addendum.patch, TEZ-1510.3.missing-file.patch, TEZ-1510.3.patch
>
>
> Currently on the first construction of a TezConfiguration, tez-site.xml gets 
> added a static resource for all future Configuration objects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-1509) Set a useful default value for java opts

2014-08-28 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved TEZ-1509.
-

   Resolution: Fixed
Fix Version/s: 0.5.0
 Hadoop Flags: Incompatible change,Reviewed  (was: Incompatible change)

ommit db4161b03d6535d79ed5c337a190b55f3ea1f198
Author: Bikas Saha 
Date:   Thu Aug 28 21:51:29 2014 -0700

TEZ-1509. Set a useful default value for java opts (bikas)


> Set a useful default value for java opts  
> --
>
> Key: TEZ-1509
> URL: https://issues.apache.org/jira/browse/TEZ-1509
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Bikas Saha
> Fix For: 0.5.0
>
> Attachments: TEZ-1509.1.patch
>
>
> A subset of the following should be considered for the defaults:
> -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true 
> -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
> -XX:+UseParallelGC



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1521) VertexDataMovementEventsGeneratedEvent is logged twice in recovery log for InputDataInformation

2014-08-28 Thread Jeff Zhang (JIRA)
Jeff Zhang created TEZ-1521:
---

 Summary: VertexDataMovementEventsGeneratedEvent is logged twice in 
recovery log for InputDataInformation
 Key: TEZ-1521
 URL: https://issues.apache.org/jira/browse/TEZ-1521
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1488) Implement HashComparator in TezBytesComparator

2014-08-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1488:
-

Attachment: TEZ-1488.2.patch

Patch with javadocs + test-case

> Implement HashComparator in TezBytesComparator
> -
>
> Key: TEZ-1488
> URL: https://issues.apache.org/jira/browse/TEZ-1488
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: TEZ-1488.1.patch, TEZ-1488.2.patch
>
>
> Speed up TezBytesComparator by ~20% when used in PipelinedSorter.
> This moves part of the key comparator into the partition comparator, which is 
> a single register operation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1521) VertexDataMovementEventsGeneratedEvent may be logged twice in recovery log

2014-08-28 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1521:


Summary: VertexDataMovementEventsGeneratedEvent may be logged twice in 
recovery log   (was: VertexDataMovementEventsGeneratedEvent is logged twice in 
recovery log for InputDataInformation)

> VertexDataMovementEventsGeneratedEvent may be logged twice in recovery log 
> ---
>
> Key: TEZ-1521
> URL: https://issues.apache.org/jira/browse/TEZ-1521
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1516) Log transfer rate for Broadcast Fetch

2014-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114947#comment-14114947
 ] 

Siddharth Seth commented on TEZ-1516:
-

[~rajesh.balamohan] - could you please take a look.

> Log transfer rate for Broadcast Fetch
> -
>
> Key: TEZ-1516
> URL: https://issues.apache.org/jira/browse/TEZ-1516
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-1516.1.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)