[jira] [Comment Edited] (TEZ-853) Support counters recovery
[ https://issues.apache.org/jira/browse/TEZ-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113433#comment-14113433 ] Jeff Zhang edited comment on TEZ-853 at 8/28/14 7:06 AM: - [~hitesh] bq. Do TaskImpl and VertexImpl write counters to recovery but they are not used when restoring state? Should the counters be written or recovered from task attempts? If the latter, then we should not write them. There's no counter written from VertexImpl and TaskImpl ( their counters are from TaskAttemptImpl ) Should we remove tezCounters from VertexFinishedProto and TaskFinishedProto ? since actually we don't use it in recovery bq. DAGImpl::restoreFromEvent does not seem to restore counters DAG do not write any counters, its counters are all from TaskAttemptImpl. So that means as long as counters of TaskAttemptImpl is recovered, Counters of DAG is recovered. bq. in a scenario where the dag finished is logged and all other events are dropped, I assume counters will be needed? Yes, you are right.This is a special case. In this case we should write counters in DAGFinishedEvent and recover from it. ( will add it ) was (Author: zjffdu): [~hitesh] bq. Do TaskImpl and VertexImpl write counters to recovery but they are not used when restoring state? Should the counters be written or recovered from task attempts? If the latter, then we should not write them. There's no counter written from VertexImpl and TaskImpl ( their counters are from TaskAttemptImpl ) bq. DAGImpl::restoreFromEvent does not seem to restore counters DAG do not write any counters, its counters are all from TaskAttemptImpl. So that means as long as counters of TaskAttemptImpl is recovered, Counters of DAG is recovered. bq. in a scenario where the dag finished is logged and all other events are dropped, I assume counters will be needed? Yes, you are right.This is a special case. In this case we should write counters in DAGFinishedEvent and recover from it. ( will add it ) > Support counters recovery > - > > Key: TEZ-853 > URL: https://issues.apache.org/jira/browse/TEZ-853 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: Tez-853.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (TEZ-1357) Display better diagnostics when AM fails to launch
[ https://issues.apache.org/jira/browse/TEZ-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang reassigned TEZ-1357: --- Assignee: Jeff Zhang > Display better diagnostics when AM fails to launch > -- > > Key: TEZ-1357 > URL: https://issues.apache.org/jira/browse/TEZ-1357 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jeff Zhang > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (TEZ-1512) VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present in the vertex
[ https://issues.apache.org/jira/browse/TEZ-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned TEZ-1512: - Assignee: Rajesh Balamohan > VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present > in the vertex > - > > Key: TEZ-1512 > URL: https://issues.apache.org/jira/browse/TEZ-1512 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Labels: performance > Attachments: TEZ-1512.1.WIP.patch, TEZ-1512.2.patch, > large_job_small_tasks.svg, with_patch_large_job_small_tasks.svg > > > I tried a synthetic benchmark (without much input data) with the tez app. > This was tried to understand the bare minimum time taken by Tez for container > launch / reuse / scheduling etc. > Profiling DAGAppMaster showed that lots of CPU time was spent on > VertexImpl.getTask(int) which gets accessed as a part of event handling and > transitions. > This problem would more prevalent in large jobs which has got lots of small > tasks. > I will attach the perf SVG output of the DAG soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TEZ-1512) VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present in the vertex
[ https://issues.apache.org/jira/browse/TEZ-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan resolved TEZ-1512. --- Resolution: Fixed Fix Version/s: 0.6.0 Hadoop Flags: Reviewed Thanks [~sseth]. Committed to master. commit ddef389a976793da397856f397398bdddc8db123 Author: Rajesh Balamohan Date: Thu Aug 28 13:41:04 2014 +0530 > VertexImpl.getTask(int) can be CPU intensive when lots of tasks are present > in the vertex > - > > Key: TEZ-1512 > URL: https://issues.apache.org/jira/browse/TEZ-1512 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Labels: performance > Fix For: 0.6.0 > > Attachments: TEZ-1512.1.WIP.patch, TEZ-1512.2.patch, > large_job_small_tasks.svg, with_patch_large_job_small_tasks.svg > > > I tried a synthetic benchmark (without much input data) with the tez app. > This was tried to understand the bare minimum time taken by Tez for container > launch / reuse / scheduling etc. > Profiling DAGAppMaster showed that lots of CPU time was spent on > VertexImpl.getTask(int) which gets accessed as a part of event handling and > transitions. > This problem would more prevalent in large jobs which has got lots of small > tasks. > I will attach the perf SVG output of the DAG soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles
[ https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1515: -- Attachment: detailed_sample_stack_trace.txt HistoryLoggingService.png DAGAppMaster_AsyncDispatcher.png RecoveryService.png > DAGAppMaster : Thread contentions due to > org.apache.tez.common.counters.ResourceBundles > --- > > Key: TEZ-1515 > URL: https://issues.apache.org/jira/browse/TEZ-1515 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Labels: performance > Attachments: DAGAppMaster_AsyncDispatcher.png, > HistoryLoggingService.png, RecoveryService.png, > detailed_sample_stack_trace.txt > > > Thread profiling DagAppMaster for a synthetic tez test revealed lots of > contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher > threads. All of these try to access tez counters and are blocked on "public > static synchronized T getValue(String bundleName, String key,String > suffix, T defaultValue)". > I will attach the thread profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles
Rajesh Balamohan created TEZ-1515: - Summary: DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles Key: TEZ-1515 URL: https://issues.apache.org/jira/browse/TEZ-1515 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Attachments: DAGAppMaster_AsyncDispatcher.png, HistoryLoggingService.png, RecoveryService.png, detailed_sample_stack_trace.txt Thread profiling DagAppMaster for a synthetic tez test revealed lots of contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher threads. All of these try to access tez counters and are blocked on "public static synchronized T getValue(String bundleName, String key,String suffix, T defaultValue)". I will attach the thread profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1495) ATS integration for TezClient
[ https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-1495: -- Attachment: TEZ-1495.2.patch - removed the wrong unit test and added a unit test for getvertex. > ATS integration for TezClient > - > > Key: TEZ-1495 > URL: https://issues.apache.org/jira/browse/TEZ-1495 > Project: Apache Tez > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch > > > Tez client should automatically redirect to ATS when the AM is not running. > All APIs exposed ( DAG status, counters, etc ) from the DAGClient should > continue to work after the AM has shut down. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1345) Add checks to guarantee all init events are written to recovery to consider vertex initialized
[ https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114025#comment-14114025 ] Hitesh Shah commented on TEZ-1345: -- bq. So overall IMO, I prefer to ignore the init events in recovery log and call initializer again in recovery. It only affect the performance of recovery while the method of adding check in canInitVertex would affect the performance of normal run of dag. Hitesh Shah, Bikas Saha What's your thoughts ? If you do this, this will result in the vertex starting from scratch. Even completed tasks will have to be dropped as there is no guarantee that the initializer will generate the same events and assign them in the same to the tasks. > Add checks to guarantee all init events are written to recovery to consider > vertex initialized > -- > > Key: TEZ-1345 > URL: https://issues.apache.org/jira/browse/TEZ-1345 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: Tez-1345-2.patch, Tez-1345.patch > > > Related to issue discovered in TEZ-1033 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1516) Log transfer rate for Broadcast Fetch
Siddharth Seth created TEZ-1516: --- Summary: Log transfer rate for Broadcast Fetch Key: TEZ-1516 URL: https://issues.apache.org/jira/browse/TEZ-1516 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1509) Set a useful default value for java opts
[ https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114049#comment-14114049 ] Bikas Saha commented on TEZ-1509: - Can someone review/comment so that this can be committed for 0.5.0. Its in incompatible change. > Set a useful default value for java opts > -- > > Key: TEZ-1509 > URL: https://issues.apache.org/jira/browse/TEZ-1509 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: TEZ-1509.1.patch > > > A subset of the following should be considered for the defaults: > -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA > -XX:+UseParallelGC -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1501) Add a test dag to generate load on the getTask RPC
[ https://issues.apache.org/jira/browse/TEZ-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114052#comment-14114052 ] Gopal V commented on TEZ-1501: -- Looks good - +1 Just needs an fs.deleteOnExit() for the PAYLOAD file for cleanups. > Add a test dag to generate load on the getTask RPC > -- > > Key: TEZ-1501 > URL: https://issues.apache.org/jira/browse/TEZ-1501 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-1501.1.txt, TEZ-1501.2.txt > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reopened TEZ-1510: -- > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned TEZ-1510: Assignee: Hitesh Shah > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1345) Add checks to guarantee all init events are written to recovery to consider vertex initialized
[ https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114081#comment-14114081 ] Bikas Saha commented on TEZ-1345: - There are 2 alternatives 1) pessimistic - save events before starting. This delays performance. This patch is not really achieving that. 2) optimistic - save events while starting. The only case where this wont work if when the AM crashes immediately after. In both cases, for now, the contract for init events is that they must be made up-front. So its a 1 time thing. When that changes, there will need to be an additional mechanism to notify the framework that initing is dont. And in fact it may not be done till the last block of data gets assigned to an owner till the very end of execution. How recovery is going to work in these cases is still not clear though the optimistic approach still works where it works. IMO the performance loss is probably not going to acceptable for short queries. What we could do is add an API that allows the VertexManager to notify the framework that it is done making updates. It could also pass along a state payload that represents its state in case we need to restart it. That notification could be saved in the log. If that notification is present during recovery then we can continue to recover from where we left off and also provide state to the VM. If that notification is not present in recovery then we start from scratch. IMO, in 99% of the cases this should be enough. The contract for VMs then clearly becomes, recovery works post DONE notification. > Add checks to guarantee all init events are written to recovery to consider > vertex initialized > -- > > Key: TEZ-1345 > URL: https://issues.apache.org/jira/browse/TEZ-1345 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: Tez-1345-2.patch, Tez-1345.patch > > > Related to issue discovered in TEZ-1033 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1501) Add a test dag to generate load on the getTask RPC
[ https://issues.apache.org/jira/browse/TEZ-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1501: Attachment: TEZ-1501.3.txt Updated patch. Committing. Thanks for the review. > Add a test dag to generate load on the getTask RPC > -- > > Key: TEZ-1501 > URL: https://issues.apache.org/jira/browse/TEZ-1501 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.6.0 > > Attachments: TEZ-1501.1.txt, TEZ-1501.2.txt, TEZ-1501.3.txt > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TEZ-1501) Add a test dag to generate load on the getTask RPC
[ https://issues.apache.org/jira/browse/TEZ-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved TEZ-1501. - Resolution: Fixed Fix Version/s: 0.6.0 Committed to master. > Add a test dag to generate load on the getTask RPC > -- > > Key: TEZ-1501 > URL: https://issues.apache.org/jira/browse/TEZ-1501 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.6.0 > > Attachments: TEZ-1501.1.txt, TEZ-1501.2.txt, TEZ-1501.3.txt > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1510: - Attachment: TEZ-1510.1.patch [~sseth] review please. > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1510.1.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114123#comment-14114123 ] Bikas Saha commented on TEZ-1510: - Does the test fail without the changes? > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1510.1.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided
[ https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114131#comment-14114131 ] Hitesh Shah commented on TEZ-1511: -- The use of NEW_API_CONFIG seems incorrect. There should be 2 such fields - one for mapper and one for the reducer - as there seems to be a mix of both mapper.new-api and reducer.new-api being used ( though not sure if that is intended or a bug ). For future reference, Configuration::getClassByName seems a better implementation than ReflectionUtils::getClazz. > MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is > not provided > --- > > Key: TEZ-1511 > URL: https://issues.apache.org/jira/browse/TEZ-1511 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch > > > Code uses: > {code} > this.outputFormat = > ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR)); > } else { > this.outputFormat = > ReflectionUtils.getClass(conf.get("mapred.output.format.class")); > {code} > where ReflectionUtils has : > {code} > Class getClass(T o) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1280) Timeline server integration for DAG history
[ https://issues.apache.org/jira/browse/TEZ-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1280: - Assignee: Prakash Ramachandran > Timeline server integration for DAG history > --- > > Key: TEZ-1280 > URL: https://issues.apache.org/jira/browse/TEZ-1280 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Prakash Ramachandran >Priority: Critical > > Umbrella jira to detail out all tasks to complete integration of Tez Client > and DAG for history. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided
[ https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114148#comment-14114148 ] Bikas Saha commented on TEZ-1511: - bq. The use of NEW_API_CONFIG seems incorrect. Not sure what you mean here. The field is different for the Input and the Output. The Input always has mapper and the Output always has reducer. The patch removes a bug in the Output where is was looking at mapper. The private constant prevents such bugs in the future. I can change to use Configuration::getClassByName instead. > MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is > not provided > --- > > Key: TEZ-1511 > URL: https://issues.apache.org/jira/browse/TEZ-1511 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch > > > Code uses: > {code} > this.outputFormat = > ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR)); > } else { > this.outputFormat = > ReflectionUtils.getClass(conf.get("mapred.output.format.class")); > {code} > where ReflectionUtils has : > {code} > Class getClass(T o) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1516) Log transfer rate for Broadcast Fetch
[ https://issues.apache.org/jira/browse/TEZ-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1516: Attachment: TEZ-1516.1.txt Simple patch to log transfer times for individual fetches, as well as an average. [~gopalv] - please review. > Log transfer rate for Broadcast Fetch > - > > Key: TEZ-1516 > URL: https://issues.apache.org/jira/browse/TEZ-1516 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-1516.1.txt > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1504) Ordered Input Shuffle can hang if there's errors while creating the Fetcher
[ https://issues.apache.org/jira/browse/TEZ-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1504: - Priority: Critical (was: Major) > Ordered Input Shuffle can hang if there's errors while creating the Fetcher > --- > > Key: TEZ-1504 > URL: https://issues.apache.org/jira/browse/TEZ-1504 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth >Priority: Critical > > As an example, a missing codec will cause the Fetcher to throw an exception - > which causes the tracking thread to die. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher
Siddharth Seth created TEZ-1517: --- Summary: Avoid sending routed events via the AsyncDispatcher Key: TEZ-1517 URL: https://issues.apache.org/jira/browse/TEZ-1517 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical Sending them via the queue ends up creating lots of unnecesaary objects (millions for a large job), as well as blocking the queue. Eventually, event routing should be handed over to a separate thread - so that the asyncdispatcher is unblocked to continue operations like launching tasks, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1510: - Attachment: TEZ-1510.2.patch Modified test to compile without patch. > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1517: Attachment: TEZ-1517.1.txt Simple patch, to send events directly to tasks. [~bikassaha], [~hitesh] - review please. > Avoid sending routed events via the AsyncDispatcher > --- > > Key: TEZ-1517 > URL: https://issues.apache.org/jira/browse/TEZ-1517 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: TEZ-1517.1.txt > > > Sending them via the queue ends up creating lots of unnecesaary objects > (millions for a large job), as well as blocking the queue. > Eventually, event routing should be handed over to a separate thread - so > that the asyncdispatcher is unblocked to continue operations like launching > tasks, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114269#comment-14114269 ] Siddharth Seth commented on TEZ-1510: - +1. Looks good. It's a little risky, since we don't know if there was some specific place where we're inadvertently relying on tez-site being part of the Configuration because it was a default resource. Most of the runtime bits should be fine, since they work off of a payload sent from the client side. > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1310) Update website documentation framework
[ https://issues.apache.org/jira/browse/TEZ-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114270#comment-14114270 ] Jonathan Eagles commented on TEZ-1310: -- One thing I forgot to mention is that the markdown back-end for maven-site-plugin (doxia-markdown-plugin) is pegdown https://github.com/sirthias/pegdown and it doesn't support roman numeral html lists. > Update website documentation framework > -- > > Key: TEZ-1310 > URL: https://issues.apache.org/jira/browse/TEZ-1310 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jonathan Eagles > Attachments: TEZ-1310-v1.patch, TEZ-1310-v2.patch > > > A better option for docs would be to use markdown format. Also, it might be > worth investigating moving to cms instead of svnpubsub. > https://www.apache.org/dev/project-site.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1510: - Attachment: TEZ-1510.3.patch Minor tweak to test. Committing shortly. > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, TEZ-1510.3.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided
[ https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1511: Attachment: TEZ-1511.3.patch Made the changes. Changed a test to use this version of the API. > MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is > not provided > --- > > Key: TEZ-1511 > URL: https://issues.apache.org/jira/browse/TEZ-1511 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch, TEZ-1511.3.patch > > > Code uses: > {code} > this.outputFormat = > ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR)); > } else { > this.outputFormat = > ReflectionUtils.getClass(conf.get("mapred.output.format.class")); > {code} > where ReflectionUtils has : > {code} > Class getClass(T o) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated TEZ-1376: - Summary: Investigate independent parallel DAGs execution in Local Mode (was: Support independent parallel DAGs execution in Local Mode) > Investigate independent parallel DAGs execution in Local Mode > - > > Key: TEZ-1376 > URL: https://issues.apache.org/jira/browse/TEZ-1376 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.4.1 >Reporter: Chen He > Attachments: differentParallelJob-ErrorOnTerminal.txt, > differentParallelJob-surefireOut.txt, differentParallelJob.patch, > simpleTestCase.patch > > > Pig on Tez allows user to submit parallel DAGs in a single pig script. Those > DAGs could be independent and concurrent. Current LocalMode may encounter > some problems when concurrent parallel DAGs are submitted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He reassigned TEZ-1376: Assignee: Chen He > Investigate independent parallel DAGs execution in Local Mode > - > > Key: TEZ-1376 > URL: https://issues.apache.org/jira/browse/TEZ-1376 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.4.1 >Reporter: Chen He >Assignee: Chen He > Attachments: differentParallelJob-ErrorOnTerminal.txt, > differentParallelJob-surefireOut.txt, differentParallelJob.patch, > simpleTestCase.patch > > > Pig on Tez allows user to submit parallel DAGs in a single pig script. Those > DAGs could be independent and concurrent. Current LocalMode may encounter > some problems when concurrent parallel DAGs are submitted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1518) Clean up ID caches on DAG completion
Siddharth Seth created TEZ-1518: --- Summary: Clean up ID caches on DAG completion Key: TEZ-1518 URL: https://issues.apache.org/jira/browse/TEZ-1518 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1508) Log a warning if Xmx is configured incorrectly.
[ https://issues.apache.org/jira/browse/TEZ-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114346#comment-14114346 ] Siddharth Seth commented on TEZ-1508: - Should we just be failing if this is configured incorrectly. > Log a warning if Xmx is configured incorrectly. > > > Key: TEZ-1508 > URL: https://issues.apache.org/jira/browse/TEZ-1508 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jonathan Eagles > Labels: newbie > > Users may incorrectly configure Xmx for tasks. Xmx should in most cases be > less than the container/task size given that YARN will kill containers that > exceed the memory limits. > Given that we already parse the java opts to detect Xmx in the java opts, it > should be trivial to add a check if the value is valid and log a warning if > not ( for both the Tez AM and the vertices in a DAG). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1499) Add OrderedJoinExample to tez-examples
[ https://issues.apache.org/jira/browse/TEZ-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114345#comment-14114345 ] Siddharth Seth commented on TEZ-1499: - Joins are common operation, for which separate examples which highlight different kinds would be useful. An alternate example could be used to demonstrate the composable nature - wordcount for example, which can be implemented with an unsorted edge. > Add OrderedJoinExample to tez-examples > -- > > Key: TEZ-1499 > URL: https://issues.apache.org/jira/browse/TEZ-1499 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > > In the current join example, the inputs of JoinProcessor is unordered so that > it will always need to load one input into memory, and stream another input. > This only fit for the case when one dataset is small enough to fit into > memory ( even use no-broadcast, memory may not be enough ). So I'd like to > add another join example that make the inputs of JoinProcessor is ordered. ( > using OrderedPartitionedKVEdgeConfig ). This kind of join could been used > when both of the 2 datasets are large. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated TEZ-1376: - Attachment: (was: differentParallelJob-ErrorOnTerminal.txt) > Investigate independent parallel DAGs execution in Local Mode > - > > Key: TEZ-1376 > URL: https://issues.apache.org/jira/browse/TEZ-1376 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.4.1 >Reporter: Chen He >Assignee: Chen He > Attachments: differentParallelJob-surefireOut.txt, > differentParallelJob.patch, simpleTestCase.patch > > > Pig on Tez allows user to submit parallel DAGs in a single pig script. Those > DAGs could be independent and concurrent. Current LocalMode may encounter > some problems when concurrent parallel DAGs are submitted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1510: - Attachment: TEZ-1510.3.addendum.patch > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Fix For: 0.5.0 > > Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, > TEZ-1510.3.addendum.patch, TEZ-1510.3.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided
[ https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114361#comment-14114361 ] Siddharth Seth commented on TEZ-1511: - "mapred.input.format.class" - would be good to have this as a constant as well, similar to the new API constant. Likewise for the OutputFormat. Why does the UnionExample need to change ? A simple test to validate the correct path would be useful. Rest looks good. > MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is > not provided > --- > > Key: TEZ-1511 > URL: https://issues.apache.org/jira/browse/TEZ-1511 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch, TEZ-1511.3.patch > > > Code uses: > {code} > this.outputFormat = > ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR)); > } else { > this.outputFormat = > ReflectionUtils.getClass(conf.get("mapred.output.format.class")); > {code} > where ReflectionUtils has : > {code} > Class getClass(T o) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1508) Log a warning if Xmx is configured incorrectly.
[ https://issues.apache.org/jira/browse/TEZ-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114363#comment-14114363 ] Hitesh Shah commented on TEZ-1508: -- No guarantees on whether this is intentionally done by a user on a cluster with memory-monitoring disabled. > Log a warning if Xmx is configured incorrectly. > > > Key: TEZ-1508 > URL: https://issues.apache.org/jira/browse/TEZ-1508 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jonathan Eagles > Labels: newbie > > Users may incorrectly configure Xmx for tasks. Xmx should in most cases be > less than the container/task size given that YARN will kill containers that > exceed the memory limits. > Given that we already parse the java opts to detect Xmx in the java opts, it > should be trivial to add a check if the value is valid and log a warning if > not ( for both the Tez AM and the vertices in a DAG). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TEZ-1508) Log a warning if Xmx is configured incorrectly.
[ https://issues.apache.org/jira/browse/TEZ-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114363#comment-14114363 ] Hitesh Shah edited comment on TEZ-1508 at 8/28/14 9:19 PM: --- No guarantees on whether this is intentionally done by a user on a cluster with memory-monitoring disabled. Also, a high Xmx need not imply the process will use that much amount of memory. was (Author: hitesh): No guarantees on whether this is intentionally done by a user on a cluster with memory-monitoring disabled. > Log a warning if Xmx is configured incorrectly. > > > Key: TEZ-1508 > URL: https://issues.apache.org/jira/browse/TEZ-1508 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jonathan Eagles > Labels: newbie > > Users may incorrectly configure Xmx for tasks. Xmx should in most cases be > less than the container/task size given that YARN will kill containers that > exceed the memory limits. > Given that we already parse the java opts to detect Xmx in the java opts, it > should be trivial to add a check if the value is valid and log a warning if > not ( for both the Tez AM and the vertices in a DAG). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1509) Set a useful default value for java opts
[ https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114371#comment-14114371 ] Gopal V commented on TEZ-1509: -- +1 on the new options. > Set a useful default value for java opts > -- > > Key: TEZ-1509 > URL: https://issues.apache.org/jira/browse/TEZ-1509 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: TEZ-1509.1.patch > > > A subset of the following should be considered for the defaults: > -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA > -XX:+UseParallelGC -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated TEZ-1376: - Attachment: (was: differentParallelJob.patch) > Investigate independent parallel DAGs execution in Local Mode > - > > Key: TEZ-1376 > URL: https://issues.apache.org/jira/browse/TEZ-1376 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.4.1 >Reporter: Chen He >Assignee: Chen He > Attachments: simpleTestCase.patch > > > Pig on Tez allows user to submit parallel DAGs in a single pig script. Those > DAGs could be independent and concurrent. Current LocalMode may encounter > some problems when concurrent parallel DAGs are submitted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated TEZ-1376: - Attachment: (was: differentParallelJob-surefireOut.txt) > Investigate independent parallel DAGs execution in Local Mode > - > > Key: TEZ-1376 > URL: https://issues.apache.org/jira/browse/TEZ-1376 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.4.1 >Reporter: Chen He >Assignee: Chen He > Attachments: simpleTestCase.patch > > > Pig on Tez allows user to submit parallel DAGs in a single pig script. Those > DAGs could be independent and concurrent. Current LocalMode may encounter > some problems when concurrent parallel DAGs are submitted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1495) ATS integration for TezClient
[ https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114393#comment-14114393 ] Hitesh Shah commented on TEZ-1495: -- Overall comments: - please look at TimelineClientImpl in yarn code for supporting secure clusters - At this point, it looks like there is no timeline client dependency so this should be ok to be in tez-api. - Regarding the current switch between AM and ATS, I think it should probably be a one-time switch. Once the client switches over to ATS, it should stick to ATS. This implies knowing when the final switchover needs to happen i.e on AM completion or when the session switches to a different DAG. You may wish to look at TestOrderedWordCount and modify it to test whether DAGClient for the client for the first dag works as intended while the second dag is running. Regarding {code} +// Status of the DAG is updated only when it completes. default to RUNNING if no status found +// as the ATS is not updated until the status of dag is running. {code} - maybe the DAG status should be updated to running whenever the dag started event is logged? > ATS integration for TezClient > - > > Key: TEZ-1495 > URL: https://issues.apache.org/jira/browse/TEZ-1495 > Project: Apache Tez > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch > > > Tez client should automatically redirect to ATS when the AM is not running. > All APIs exposed ( DAG status, counters, etc ) from the DAGClient should > continue to work after the AM has shut down. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated TEZ-1376: - Attachment: independantDAG2.data independantDAG1.data independantDAG.pig a pig example with input files to test TEZ Local Mode > Investigate independent parallel DAGs execution in Local Mode > - > > Key: TEZ-1376 > URL: https://issues.apache.org/jira/browse/TEZ-1376 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.4.1 >Reporter: Chen He >Assignee: Chen He > Attachments: independantDAG.pig, independantDAG1.data, > independantDAG2.data, simpleTestCase.patch > > > Pig on Tez allows user to submit parallel DAGs in a single pig script. Those > DAGs could be independent and concurrent. Current LocalMode may encounter > some problems when concurrent parallel DAGs are submitted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114392#comment-14114392 ] Chen He edited comment on TEZ-1376 at 8/28/14 9:37 PM: --- a pig example with input files to test TEZ Local Mode running independent DAGs. was (Author: airbots): a pig example with input files to test TEZ Local Mode > Investigate independent parallel DAGs execution in Local Mode > - > > Key: TEZ-1376 > URL: https://issues.apache.org/jira/browse/TEZ-1376 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.4.1 >Reporter: Chen He >Assignee: Chen He > Attachments: independantDAG.pig, independantDAG1.data, > independantDAG2.data, simpleTestCase.patch > > > Pig on Tez allows user to submit parallel DAGs in a single pig script. Those > DAGs could be independent and concurrent. Current LocalMode may encounter > some problems when concurrent parallel DAGs are submitted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (TEZ-1376) Investigate independent parallel DAGs execution in Local Mode
[ https://issues.apache.org/jira/browse/TEZ-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated TEZ-1376: - Comment: was deleted (was: The sequential number looks incorrect since LocalClient use "appIdNumber++" to create the incremental sequential number "0001". This is not a critical issue. The important is that I create both session and non-session parallel different job test. Both of them report "org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion". I am investigating the reason.) > Investigate independent parallel DAGs execution in Local Mode > - > > Key: TEZ-1376 > URL: https://issues.apache.org/jira/browse/TEZ-1376 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.4.1 >Reporter: Chen He >Assignee: Chen He > Attachments: independantDAG.pig, independantDAG1.data, > independantDAG2.data, simpleTestCase.patch > > > Pig on Tez allows user to submit parallel DAGs in a single pig script. Those > DAGs could be independent and concurrent. Current LocalMode may encounter > some problems when concurrent parallel DAGs are submitted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1510: - Attachment: TEZ-1510.3.missing-file.patch Add missing file needed for test > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Fix For: 0.5.0 > > Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, > TEZ-1510.3.addendum.patch, TEZ-1510.3.missing-file.patch, TEZ-1510.3.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1488) Implement HashComparator in TezBytesComparator
[ https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114428#comment-14114428 ] Siddharth Seth commented on TEZ-1488: - +1. Looks good. Some documentation on the HashComparator interface will be useful on what it is used for. Also is getHashCode the correct name - something like extractKeyMSBytes may be more appropriate. > Implement HashComparator in TezBytesComparator > - > > Key: TEZ-1488 > URL: https://issues.apache.org/jira/browse/TEZ-1488 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: TEZ-1488.1.patch > > > Speed up TezBytesComparator by ~20% when used in PipelinedSorter. > This moves part of the key comparator into the partition comparator, which is > a single register operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1488) Implement HashComparator in TezBytesComparator
[ https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114448#comment-14114448 ] Gopal V commented on TEZ-1488: -- I think we should call the interface what it really - a ProxyComparator? I will do the renames & write docs. > Implement HashComparator in TezBytesComparator > - > > Key: TEZ-1488 > URL: https://issues.apache.org/jira/browse/TEZ-1488 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: TEZ-1488.1.patch > > > Speed up TezBytesComparator by ~20% when used in PipelinedSorter. > This moves part of the key comparator into the partition comparator, which is > a single register operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114460#comment-14114460 ] Hitesh Shah commented on TEZ-1517: -- Seems reasonable. I am assuming that this is after the events pass VertexImpl routing hence recovery will not be affected. The only problem is the change is not thread-safe - list is a simple array list and function is not synchronized/locked as needed. > Avoid sending routed events via the AsyncDispatcher > --- > > Key: TEZ-1517 > URL: https://issues.apache.org/jira/browse/TEZ-1517 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: TEZ-1517.1.txt > > > Sending them via the queue ends up creating lots of unnecesaary objects > (millions for a large job), as well as blocking the queue. > Eventually, event routing should be handed over to a separate thread - so > that the asyncdispatcher is unblocked to continue operations like launching > tasks, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles
[ https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114486#comment-14114486 ] Siddharth Seth commented on TEZ-1515: - [~rajesh.balamohan] - should we just remove support for resource bundles and localization. This would remove display names as well - which can be added back with a simpler mechanism. > DAGAppMaster : Thread contentions due to > org.apache.tez.common.counters.ResourceBundles > --- > > Key: TEZ-1515 > URL: https://issues.apache.org/jira/browse/TEZ-1515 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Labels: performance > Attachments: DAGAppMaster_AsyncDispatcher.png, > HistoryLoggingService.png, RecoveryService.png, > detailed_sample_stack_trace.txt > > > Thread profiling DagAppMaster for a synthetic tez test revealed lots of > contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher > threads. All of these try to access tez counters and are blocked on "public > static synchronized T getValue(String bundleName, String key,String > suffix, T defaultValue)". > I will attach the thread profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1488) Implement HashComparator in TezBytesComparator
[ https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114489#comment-14114489 ] Siddharth Seth commented on TEZ-1488: - ComparatorPrefixGenerator ? > Implement HashComparator in TezBytesComparator > - > > Key: TEZ-1488 > URL: https://issues.apache.org/jira/browse/TEZ-1488 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: TEZ-1488.1.patch > > > Speed up TezBytesComparator by ~20% when used in PipelinedSorter. > This moves part of the key comparator into the partition comparator, which is > a single register operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114503#comment-14114503 ] Siddharth Seth commented on TEZ-1517: - It's still running on the EventHandler thread - just not via separate events. So the usage continues to be the same as it was. I haven't added a separate thread here, will be creating a separate jira for that and will address synchronization at that point. > Avoid sending routed events via the AsyncDispatcher > --- > > Key: TEZ-1517 > URL: https://issues.apache.org/jira/browse/TEZ-1517 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: TEZ-1517.1.txt > > > Sending them via the queue ends up creating lots of unnecesaary objects > (millions for a large job), as well as blocking the queue. > Eventually, event routing should be handed over to a separate thread - so > that the asyncdispatcher is unblocked to continue operations like launching > tasks, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1509) Set a useful default value for java opts
[ https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114527#comment-14114527 ] Tsuyoshi OZAWA commented on TEZ-1509: - The new option looks good to me. > Set a useful default value for java opts > -- > > Key: TEZ-1509 > URL: https://issues.apache.org/jira/browse/TEZ-1509 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha > Attachments: TEZ-1509.1.patch > > > A subset of the following should be considered for the defaults: > -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA > -XX:+UseParallelGC -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1517) Avoid sending routed events via the AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1517: Attachment: TEZ-1517.2.txt This does require minimal locking even here - which was earlier being done by the handleEvent invocation. Yes, recovery is not impacted - this is after the initial events come in. Committing in a bit, thanks for the review. > Avoid sending routed events via the AsyncDispatcher > --- > > Key: TEZ-1517 > URL: https://issues.apache.org/jira/browse/TEZ-1517 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: TEZ-1517.1.txt, TEZ-1517.2.txt > > > Sending them via the queue ends up creating lots of unnecesaary objects > (millions for a large job), as well as blocking the queue. > Eventually, event routing should be handed over to a separate thread - so > that the asyncdispatcher is unblocked to continue operations like launching > tasks, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1519) TezTaskRunner should not initialize TezConfiguration in TezChild
Hitesh Shah created TEZ-1519: Summary: TezTaskRunner should not initialize TezConfiguration in TezChild Key: TEZ-1519 URL: https://issues.apache.org/jira/browse/TEZ-1519 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Priority: Blocker Should be doing a new Configuration and augmenting with the config data from tez-conf.pb. Need confirmation on tez-conf.pb being localized for all containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles
[ https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1515: -- Attachment: TEZ-1515.1.patch Attachment removes the resource bundle/localization usage. [~sseth] Can you please review? > DAGAppMaster : Thread contentions due to > org.apache.tez.common.counters.ResourceBundles > --- > > Key: TEZ-1515 > URL: https://issues.apache.org/jira/browse/TEZ-1515 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Labels: performance > Attachments: DAGAppMaster_AsyncDispatcher.png, > HistoryLoggingService.png, RecoveryService.png, TEZ-1515.1.patch, > detailed_sample_stack_trace.txt > > > Thread profiling DagAppMaster for a synthetic tez test revealed lots of > contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher > threads. All of these try to access tez counters and are blocked on "public > static synchronized T getValue(String bundleName, String key,String > suffix, T defaultValue)". > I will attach the thread profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles
[ https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114597#comment-14114597 ] Siddharth Seth commented on TEZ-1515: - Looks good to me. Should remove the ResourceBundles class as well if it isn't used. > DAGAppMaster : Thread contentions due to > org.apache.tez.common.counters.ResourceBundles > --- > > Key: TEZ-1515 > URL: https://issues.apache.org/jira/browse/TEZ-1515 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Labels: performance > Attachments: DAGAppMaster_AsyncDispatcher.png, > HistoryLoggingService.png, RecoveryService.png, TEZ-1515.1.patch, > detailed_sample_stack_trace.txt > > > Thread profiling DagAppMaster for a synthetic tez test revealed lots of > contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher > threads. All of these try to access tez counters and are blocked on "public > static synchronized T getValue(String bundleName, String key,String > suffix, T defaultValue)". > I will attach the thread profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles
[ https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114598#comment-14114598 ] Siddharth Seth commented on TEZ-1515: - And open a follow up jira to support displayNames better. > DAGAppMaster : Thread contentions due to > org.apache.tez.common.counters.ResourceBundles > --- > > Key: TEZ-1515 > URL: https://issues.apache.org/jira/browse/TEZ-1515 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Labels: performance > Attachments: DAGAppMaster_AsyncDispatcher.png, > HistoryLoggingService.png, RecoveryService.png, TEZ-1515.1.patch, > detailed_sample_stack_trace.txt > > > Thread profiling DagAppMaster for a synthetic tez test revealed lots of > contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher > threads. All of these try to access tez counters and are blocked on "public > static synchronized T getValue(String bundleName, String key,String > suffix, T defaultValue)". > I will attach the thread profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1515) DAGAppMaster : Thread contentions due to org.apache.tez.common.counters.ResourceBundles
[ https://issues.apache.org/jira/browse/TEZ-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1515: -- Attachment: TEZ-1515.2.patch Removed ResourceBundles in attached patch. > DAGAppMaster : Thread contentions due to > org.apache.tez.common.counters.ResourceBundles > --- > > Key: TEZ-1515 > URL: https://issues.apache.org/jira/browse/TEZ-1515 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Labels: performance > Attachments: DAGAppMaster_AsyncDispatcher.png, > HistoryLoggingService.png, RecoveryService.png, TEZ-1515.1.patch, > TEZ-1515.2.patch, detailed_sample_stack_trace.txt > > > Thread profiling DagAppMaster for a synthetic tez test revealed lots of > contentions in RecoveryService / HistoryEventHandlingThread / AsyncDispatcher > threads. All of these try to access tez counters and are blocked on "public > static synchronized T getValue(String bundleName, String key,String > suffix, T defaultValue)". > I will attach the thread profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1520) TezCounters: support displayNames better
Rajesh Balamohan created TEZ-1520: - Summary: TezCounters: support displayNames better Key: TEZ-1520 URL: https://issues.apache.org/jira/browse/TEZ-1520 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1520) TezCounters: support better displayNames
[ https://issues.apache.org/jira/browse/TEZ-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1520: -- Summary: TezCounters: support better displayNames (was: TezCounters: support displayNames better) > TezCounters: support better displayNames > > > Key: TEZ-1520 > URL: https://issues.apache.org/jira/browse/TEZ-1520 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1495) ATS integration for TezClient
[ https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114678#comment-14114678 ] Prakash Ramachandran commented on TEZ-1495: --- [~hitesh] bq. Regarding the current switch between AM and ATS, I think it should probably be a one-time switch. can switch to ATS on AM completion. however if it also handles the case of AM relaunch (i.e switched to ATS when AM was not reachable) and the event for AM completion (from new AM after it completes) for some reason does not reach ATS wont this cause a indefinite wait? > ATS integration for TezClient > - > > Key: TEZ-1495 > URL: https://issues.apache.org/jira/browse/TEZ-1495 > Project: Apache Tez > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch > > > Tez client should automatically redirect to ATS when the AM is not running. > All APIs exposed ( DAG status, counters, etc ) from the DAGClient should > continue to work after the AM has shut down. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1488) Implement HashComparator in TezBytesComparator
[ https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114746#comment-14114746 ] Gopal V commented on TEZ-1488: -- This was originally an interface for BinaryComparable called PrefixComparable etc. http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-issues/201211.mbox/%3C1195493022.84126.1352331614565.JavaMail.jiratomcat@arcas%3E I'd like to retain those roots with a ProxyComparator because what it returns is really a proxy for key comparisons. That is it implies that getProxy(k1) < getProxy(k2) ==> k1 < k2 but getProxy(k1) == getProxy(k2) ==> k1 ?? k2 (no relation) > Implement HashComparator in TezBytesComparator > - > > Key: TEZ-1488 > URL: https://issues.apache.org/jira/browse/TEZ-1488 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: TEZ-1488.1.patch > > > Speed up TezBytesComparator by ~20% when used in PipelinedSorter. > This moves part of the key comparator into the partition comparator, which is > a single register operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114771#comment-14114771 ] Bikas Saha commented on TEZ-1510: - Please comment on whether this is safe for 0.5.0 or not. > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Fix For: 0.5.0 > > Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, > TEZ-1510.3.addendum.patch, TEZ-1510.3.missing-file.patch, TEZ-1510.3.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided
[ https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114773#comment-14114773 ] Bikas Saha commented on TEZ-1511: - UnionExample is run today as an e2e test for the build. So this captures e2e regression from a Hive/Pig usage point of view. The Precondition check is already acting as a built in self test. > MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is > not provided > --- > > Key: TEZ-1511 > URL: https://issues.apache.org/jira/browse/TEZ-1511 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch, TEZ-1511.3.patch > > > Code uses: > {code} > this.outputFormat = > ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR)); > } else { > this.outputFormat = > ReflectionUtils.getClass(conf.get("mapred.output.format.class")); > {code} > where ReflectionUtils has : > {code} > Class getClass(T o) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1511) MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is not provided
[ https://issues.apache.org/jira/browse/TEZ-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1511: Attachment: TEZ-1511.4.patch Attaching patch that creates configs for mapper/reduce.new-api in MRJobConfig instead of the private constants. These are still private to Tez overall. > MROutputConfigBuilder sets OutputFormat as String class if OutputFormat is > not provided > --- > > Key: TEZ-1511 > URL: https://issues.apache.org/jira/browse/TEZ-1511 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha >Priority: Blocker > Attachments: TEZ-1511.1.patch, TEZ-1511.2.patch, TEZ-1511.3.patch, > TEZ-1511.4.patch > > > Code uses: > {code} > this.outputFormat = > ReflectionUtils.getClass(conf.get(MRJobConfig.OUTPUT_FORMAT_CLASS_ATTR)); > } else { > this.outputFormat = > ReflectionUtils.getClass(conf.get("mapred.output.format.class")); > {code} > where ReflectionUtils has : > {code} > Class getClass(T o) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1495) ATS integration for TezClient
[ https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114849#comment-14114849 ] Hitesh Shah commented on TEZ-1495: -- bq. can switch to ATS on AM completion. however if it also handles the case of AM relaunch Sorry should have clarified. it should be a one-time switch on application completion ( i.e. after all AM attempts finish ). Based on the current implementation, it is switching when the AM process goes down i.e it would switch to ATS for a temporary period but then switch back to the next AM attempt. However, at this point, it would also need to monitor the application report from YARN to check whether the application has completed or not. bq. the event for AM completion (from new AM after it completes) for some reason does not reach ATS wont this cause a indefinite wait Could you shed more clarity on this. The ATS data need not be definitive though polling the final application state from the RM would be enough to short-circuit the wait loop. > ATS integration for TezClient > - > > Key: TEZ-1495 > URL: https://issues.apache.org/jira/browse/TEZ-1495 > Project: Apache Tez > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch > > > Tez client should automatically redirect to ATS when the AM is not running. > All APIs exposed ( DAG status, counters, etc ) from the DAGClient should > continue to work after the AM has shut down. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (TEZ-1495) ATS integration for TezClient
[ https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114849#comment-14114849 ] Hitesh Shah edited comment on TEZ-1495 at 8/29/14 4:42 AM: --- bq. can switch to ATS on AM completion. however if it also handles the case of AM relaunch Sorry should have clarified. it should be a one-time switch on application completion ( i.e. after all AM attempts finish ). Based on the current implementation, it is switching when the AM process goes down i.e it would switch to ATS for a temporary period but then switch back to the next AM attempt. However, at this point, it would also need to monitor the application report from YARN to check whether the application has completed or not. bq. the event for AM completion (from new AM after it completes) for some reason does not reach ATS wont this cause a indefinite wait Could you shed more clarity on this. The ATS data need not be definitive though polling the final application state from the RM would be enough to short-circuit the wait loop ( with some level of waiting to ensure that any delay in propagating data from AM to ATS is accounted for ). was (Author: hitesh): bq. can switch to ATS on AM completion. however if it also handles the case of AM relaunch Sorry should have clarified. it should be a one-time switch on application completion ( i.e. after all AM attempts finish ). Based on the current implementation, it is switching when the AM process goes down i.e it would switch to ATS for a temporary period but then switch back to the next AM attempt. However, at this point, it would also need to monitor the application report from YARN to check whether the application has completed or not. bq. the event for AM completion (from new AM after it completes) for some reason does not reach ATS wont this cause a indefinite wait Could you shed more clarity on this. The ATS data need not be definitive though polling the final application state from the RM would be enough to short-circuit the wait loop. > ATS integration for TezClient > - > > Key: TEZ-1495 > URL: https://issues.apache.org/jira/browse/TEZ-1495 > Project: Apache Tez > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-1495.1.patch, TEZ-1495.2.patch, TEZ-1495.WIP.1.patch > > > Tez client should automatically redirect to ATS when the AM is not running. > All APIs exposed ( DAG status, counters, etc ) from the DAGClient should > continue to work after the AM has shut down. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1510) TezConfiguration should not add tez-site.xml as a default resource.
[ https://issues.apache.org/jira/browse/TEZ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114851#comment-14114851 ] Hitesh Shah commented on TEZ-1510: -- Ran a few example jobs and did not see any issues after this patch. > TezConfiguration should not add tez-site.xml as a default resource. > > > Key: TEZ-1510 > URL: https://issues.apache.org/jira/browse/TEZ-1510 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah >Priority: Blocker > Fix For: 0.5.0 > > Attachments: TEZ-1510.1.patch, TEZ-1510.2.patch, > TEZ-1510.3.addendum.patch, TEZ-1510.3.missing-file.patch, TEZ-1510.3.patch > > > Currently on the first construction of a TezConfiguration, tez-site.xml gets > added a static resource for all future Configuration objects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (TEZ-1509) Set a useful default value for java opts
[ https://issues.apache.org/jira/browse/TEZ-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved TEZ-1509. - Resolution: Fixed Fix Version/s: 0.5.0 Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) ommit db4161b03d6535d79ed5c337a190b55f3ea1f198 Author: Bikas Saha Date: Thu Aug 28 21:51:29 2014 -0700 TEZ-1509. Set a useful default value for java opts (bikas) > Set a useful default value for java opts > -- > > Key: TEZ-1509 > URL: https://issues.apache.org/jira/browse/TEZ-1509 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Bikas Saha > Fix For: 0.5.0 > > Attachments: TEZ-1509.1.patch > > > A subset of the following should be considered for the defaults: > -server -XX:+UseCompressedStrings -Djava.net.preferIPv4Stack=true > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA > -XX:+UseParallelGC -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (TEZ-1521) VertexDataMovementEventsGeneratedEvent is logged twice in recovery log for InputDataInformation
Jeff Zhang created TEZ-1521: --- Summary: VertexDataMovementEventsGeneratedEvent is logged twice in recovery log for InputDataInformation Key: TEZ-1521 URL: https://issues.apache.org/jira/browse/TEZ-1521 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1488) Implement HashComparator in TezBytesComparator
[ https://issues.apache.org/jira/browse/TEZ-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-1488: - Attachment: TEZ-1488.2.patch Patch with javadocs + test-case > Implement HashComparator in TezBytesComparator > - > > Key: TEZ-1488 > URL: https://issues.apache.org/jira/browse/TEZ-1488 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: TEZ-1488.1.patch, TEZ-1488.2.patch > > > Speed up TezBytesComparator by ~20% when used in PipelinedSorter. > This moves part of the key comparator into the partition comparator, which is > a single register operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (TEZ-1521) VertexDataMovementEventsGeneratedEvent may be logged twice in recovery log
[ https://issues.apache.org/jira/browse/TEZ-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-1521: Summary: VertexDataMovementEventsGeneratedEvent may be logged twice in recovery log (was: VertexDataMovementEventsGeneratedEvent is logged twice in recovery log for InputDataInformation) > VertexDataMovementEventsGeneratedEvent may be logged twice in recovery log > --- > > Key: TEZ-1521 > URL: https://issues.apache.org/jira/browse/TEZ-1521 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TEZ-1516) Log transfer rate for Broadcast Fetch
[ https://issues.apache.org/jira/browse/TEZ-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114947#comment-14114947 ] Siddharth Seth commented on TEZ-1516: - [~rajesh.balamohan] - could you please take a look. > Log transfer rate for Broadcast Fetch > - > > Key: TEZ-1516 > URL: https://issues.apache.org/jira/browse/TEZ-1516 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-1516.1.txt > > -- This message was sent by Atlassian JIRA (v6.2#6252)