[jira] [Commented] (TEZ-2199) updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS

2015-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361454#comment-14361454
 ] 

Hadoop QA commented on TEZ-2199:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12704551/TEZ-2199.1.patch
  against master revision b18552b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/303//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/303//console

This message is automatically generated.

> updateLocalResourcesForInputSplits assumes wrongly that split data is on same 
> FS as the default FS
> --
>
> Key: TEZ-2199
> URL: https://issues.apache.org/jira/browse/TEZ-2199
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-2199.1.patch
>
>
> Seen in a Windows Azure scenario:
> Caused by: java.io.FileNotFoundException: 
> hdfs://namenode:9000/hive/scratch/_tez_scratch_dir/split_Map_1/job.split: No 
> such file or directory.
>   at 
> org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1625)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.updateLocalResourcesForInputSplits(MRInputHelpers.java:639)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.configureMRInputWithLegacySplitGeneration(MRInputHelpers.java:115)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2199 PreCommit Build #303

2015-03-13 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2199
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/303/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2748 lines...]
[INFO] Final Memory: 70M/967M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12704551/TEZ-2199.1.patch
  against master revision b18552b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/303//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/303//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
ae43047303f0ec5a143958f9f93034a82f2a8dbd logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #297
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2592123 bytes
Compression is 4.8%
Took 0.91 sec
Description set: TEZ-2199
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-1909) Remove need to copy over all events from attempt 1 to attempt 2 dir

2015-03-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361426#comment-14361426
 ] 

Hitesh Shah commented on TEZ-1909:
--

Comments:

- any reason why this is needed in the DAGAppMaster "Set getDagIDs()" ? 
- the "if (skipAllOtherEvents) {" check is probably also needed at the top of 
the loop to prevent new files from being opened and read ( in addition to 
short-circuiting the read of all events in the given file ). Maybe just log a 
message that other files were present and skipped
- I do not see TEZ_AM_RECOVERY_HANDLE_REMAINING_EVENT_WHEN_STOPPED being used 
anywhere apart from being set to true in one of the tests.
- please replace "import com.sun.tools.javac.util.List;" with java.lang.List
- testCorruptedLastRecord should also verify that the dag submitted event was 
seen. 
- also, we should add a test for adding corrupt data to the summary stream and 
ensuring that its processing fails
- there may not be a need to add "getDAGNames()". Instead, you can just use 
"dagAppMaster.dagNames.add(dagSummaryData.dagName);" as dagNames should be 
package-private.




 

> Remove need to copy over all events from attempt 1 to attempt 2 dir
> ---
>
> Key: TEZ-1909
> URL: https://issues.apache.org/jira/browse/TEZ-1909
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch
>
>
> Use of file versions should prevent the need for copying over data into a 
> second attempt dir. Care needs to be taken to handle "last corrupt record" 
> handling. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2193) Check returned value from EdgeManagerPlugin before using it

2015-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361402#comment-14361402
 ] 

Bikas Saha commented on TEZ-2193:
-

Perhaps add a test similar to TestEdge#testOneToOneEdgeManager() for the 
scatter gather edge change.

getPhysicalOutput/Input... methods may be called multiple times when creating 
the tasks of large vertex. It would help if the Preconditions message was not 
created with string + everytime (even though its going to pass almost always). 
Perhaps we can use a pre-assembled string here if we don't print the actual 
invalid value.
{code} +  Preconditions.checkArgument(physicalOutputCount >= 0,
+  "PhysicalOutputCount should not be negative,"
+  + "physicalOutputCount=" + physicalOutputCount
+  + ", srcVertex=" + sourceVertex.getLogIdentifier()
+  + ", destVertex=" + destinationVertex.getLogIdentifier()
+  + ", EdgeManager=" + edgeManager.getClass().getName());{code}

Consumer task num can be 0 because a task in the source may not have any 
consumers in this edge but may have consumers on a different edge.
{code}   srcTaskIndex);
+  Preconditions.checkArgument(numConsumers > 0,
+  "ConsumerTaskNum must be positive,"
+  + "numConsumers=" + numConsumers{code}

> Check returned value from EdgeManagerPlugin before using it
> ---
>
> Key: TEZ-2193
> URL: https://issues.apache.org/jira/browse/TEZ-2193
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2193-1.patch, TEZ-2193-2.patch, TEZ-2193-3.patch
>
>
> e.g. dag has vertices v1, v2 and shuffle edge between them, and v2 has custom 
> vertex manager and -1 parallelism. In this case v1's output spec may be has 
> -1 physical edge which will cause task hangs in TezChild.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2199) updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS

2015-03-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2199:
-
Attachment: TEZ-2199.1.patch

[~sseth] review please.

> updateLocalResourcesForInputSplits assumes wrongly that split data is on same 
> FS as the default FS
> --
>
> Key: TEZ-2199
> URL: https://issues.apache.org/jira/browse/TEZ-2199
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-2199.1.patch
>
>
> Seen in a Windows Azure scenario:
> Caused by: java.io.FileNotFoundException: 
> hdfs://namenode:9000/hive/scratch/_tez_scratch_dir/split_Map_1/job.split: No 
> such file or directory.
>   at 
> org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1625)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.updateLocalResourcesForInputSplits(MRInputHelpers.java:639)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.configureMRInputWithLegacySplitGeneration(MRInputHelpers.java:115)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2021) Tez tool to analyze shuffle performance in large clusters by mining task logs

2015-03-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361379#comment-14361379
 ] 

Rajesh Balamohan edited comment on TEZ-2021 at 3/14/15 12:19 AM:
-

[~jeagles] This has been tested on small cluster with 20 nodes.  It would be 
really helpful if you would like to try it out and provide your comments.

* Apply this patch.
* Build tez-tfile-parser in $TEZ/tez-tools/tez-tfile-parser/
** "mvn clean package"
* Populate env.sh in $TEZ/tez-tools/perf-analyzer/shuffle/
** PIG_HOME, TEZ_HOME
** YARN_APP_LOGS_LOCATION
*** "yarn.log-aggregation-enable" is set to true in the cluster
*** Note down "yarn.nodemanager.remote-app-log-dir & 
yarn.nodemanager.remote-app-log-dir-suffix" parameters in your cluster and 
setup YARN_APP_LOGS_LOCATIONin env.sh appropriately
* This requires "gnuplot" in the machine where you are planning to run. 
* Run "sh gnuplot.sh " (In case you would like to parse some 
other user's job, you might want to set "export APP_USER=appUserWhoRanTheJob" 
before running this)


was (Author: rajesh.balamohan):
This has been tested on small cluster with 20 nodes.  It would be really 
helpful if you would like to try it out and provide your comments.

* Apply this patch.
* Build tez-tfile-parser in $TEZ/tez-tools/tez-tfile-parser/
** "mvn clean package"
* Populate env.sh in $TEZ/tez-tools/perf-analyzer/shuffle/
** PIG_HOME, TEZ_HOME
** YARN_APP_LOGS_LOCATION
*** "yarn.log-aggregation-enable" is set to true in the cluster
*** Note down "yarn.nodemanager.remote-app-log-dir & 
yarn.nodemanager.remote-app-log-dir-suffix" parameters in your cluster and 
setup YARN_APP_LOGS_LOCATIONin env.sh appropriately
* This requires "gnuplot" in the machine where you are planning to run. 
* Run "sh gnuplot.sh " (In case you would like to parse some 
other user's job, you might want to set "export APP_USER=appUserWhoRanTheJob" 
before running this)

> Tez tool to analyze shuffle performance in large clusters by mining task logs
> -
>
> Key: TEZ-2021
> URL: https://issues.apache.org/jira/browse/TEZ-2021
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2021.1.patch, TEZ-2021.2.patch, 
> avg_time_Taken_after_fix.png, avg_time_taken_to_download.png, 
> no_of_times_contacted.png, total_data_transferred.png
>
>
> Tez tool to analyze shuffle performance in large clusters by mining task 
> logs. Provide an easier way to visualize (heat chart) and identify bad nodes 
> in large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2021) Tez tool to analyze shuffle performance in large clusters by mining task logs

2015-03-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361379#comment-14361379
 ] 

Rajesh Balamohan commented on TEZ-2021:
---

This has been tested on small cluster with 20 nodes.  It would be really 
helpful if you would like to try it out and provide your comments.

* Apply this patch.
* Build tez-tfile-parser in $TEZ/tez-tools/tez-tfile-parser/
** "mvn clean package"
* Populate env.sh in $TEZ/tez-tools/perf-analyzer/shuffle/
** PIG_HOME, TEZ_HOME
** YARN_APP_LOGS_LOCATION
*** "yarn.log-aggregation-enable" is set to true in the cluster
*** Note down "yarn.nodemanager.remote-app-log-dir & 
yarn.nodemanager.remote-app-log-dir-suffix" parameters in your cluster and 
setup YARN_APP_LOGS_LOCATIONin env.sh appropriately
* This requires "gnuplot" in the machine where you are planning to run. 
* Run "sh gnuplot.sh " (In case you would like to parse some 
other user's job, you might want to set "export APP_USER=appUserWhoRanTheJob" 
before running this)

> Tez tool to analyze shuffle performance in large clusters by mining task logs
> -
>
> Key: TEZ-2021
> URL: https://issues.apache.org/jira/browse/TEZ-2021
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2021.1.patch, TEZ-2021.2.patch, 
> avg_time_Taken_after_fix.png, avg_time_taken_to_download.png, 
> no_of_times_contacted.png, total_data_transferred.png
>
>
> Tez tool to analyze shuffle performance in large clusters by mining task 
> logs. Provide an easier way to visualize (heat chart) and identify bad nodes 
> in large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-160) Remove 5 second sleep at the end of AM completion.

2015-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361375#comment-14361375
 ] 

Bikas Saha edited comment on TEZ-160 at 3/14/15 12:17 AM:
--

This should affect you if your tests are not using session mode and running 1 
dag per AM. Is that the case?


was (Author: bikassaha):
This should affect you if your tests are not using session mode. Is that the 
case?

> Remove 5 second sleep at the end of AM completion.
> --
>
> Key: TEZ-160
> URL: https://issues.apache.org/jira/browse/TEZ-160
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Siddharth Seth
>  Labels: TEZ-0.2.0
>
> ClientServiceDelegate/DAGClient doesn't seem to be getting job completion 
> status from the AM after job completion. It, instead, always relies on the RM 
> for this information. The information returned by the AM should be used while 
> it's available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-160) Remove 5 second sleep at the end of AM completion.

2015-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361375#comment-14361375
 ] 

Bikas Saha edited comment on TEZ-160 at 3/14/15 12:18 AM:
--

This only happens at AM shutdown, not DAG completion. This should affect you if 
your tests are not using session mode and running 1 dag per AM. Is that the 
case?


was (Author: bikassaha):
This should affect you if your tests are not using session mode and running 1 
dag per AM. Is that the case?

> Remove 5 second sleep at the end of AM completion.
> --
>
> Key: TEZ-160
> URL: https://issues.apache.org/jira/browse/TEZ-160
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Siddharth Seth
>  Labels: TEZ-0.2.0
>
> ClientServiceDelegate/DAGClient doesn't seem to be getting job completion 
> status from the AM after job completion. It, instead, always relies on the RM 
> for this information. The information returned by the AM should be used while 
> it's available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.

2015-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361375#comment-14361375
 ] 

Bikas Saha commented on TEZ-160:


This should affect you if your tests are not using session mode. Is that the 
case?

> Remove 5 second sleep at the end of AM completion.
> --
>
> Key: TEZ-160
> URL: https://issues.apache.org/jira/browse/TEZ-160
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Siddharth Seth
>  Labels: TEZ-0.2.0
>
> ClientServiceDelegate/DAGClient doesn't seem to be getting job completion 
> status from the AM after job completion. It, instead, always relies on the RM 
> for this information. The information returned by the AM should be used while 
> it's available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361273#comment-14361273
 ] 

Bikas Saha commented on TEZ-776:


All options have been sufficiently discussed on this thread and offline. The 
option of moving all of event handling to edge plugins is a much larger change 
that shifts a lot of framework responsibility to the user. Secondly, its not 
clear how future changes/features additions around dynamic graph 
reconfigurations like changes edges and vertices at runtime may or may not be 
affected by having given control of event management to user code. Things like 
event obsoletion which can be done easily by the framework for all edges and 
IO's would need to be done by every plugin. Every plugin would need to have 
additional metadata tracking objects which are currently provided by the 
framework. Each plugin would have to handle versioning of events and 
speculation like conditions which break the time-sequential nature of version 
numbers. And probably other stuff. Firstly, that is a much larger change, that 
is related but orthogonal to the memory issue and must be discussed separately 
on its own right. Secondly, while at a high level it may seem likely that in 
some cases edge plugins might do better at CPU, I suspect that after handling 
event versioning, obsoletion, etc. the argument that plugins can avoid 
iterating over events may turn out to be specious for CPU efficiency. My 
suggestion to follow up on that approach separately is based on the above 
arguments. It's not been effectively established that moving essential 
framework responsibilities to the user is the right approach long term. Neither 
is it clear that the CPU efficiency of the final implementation that does more 
than the sunny day scenario is going to be significantly better at the cost of 
adding complexity in user code. That can only be measured. Hence, I suggested 
that it be evaluated before including that change in the project. This is the 
case with any change or feature right? My only objection was to tie the 
progress on this jira by pre-accepting the other changes without going through 
due process. Specially when this jira does not mandate any user code changes.

In the meanwhile, the current patch does not mandate any API changes for users. 
Unless users want to make the API change they can continue to use the existing 
API, even across releases. If they do want to make the change, its much simpler 
because it follows the existing pattern. But for users who are running large 
jobs and using framework built-in components, they can be unblocked on their 
scalability issues. Hence, my suggestion to complete the reviews of this patch 
and resolve it so that there is forward progress without requiring any user to 
make any code changes.

In order to make progress, what I can try to do is limit the on demand routing 
to only composite event expansion and not change the flow for any other event. 
Add a new optional API for composite event expansion that will be implement by 
internal scatter-gather edge and optional so that users dont need to change 
their code. This will solve the memory scalability issue without increasing any 
CPU cost compared to any scenario as it exists today.

I hope that clarifies and we can make progress on this jira.

> Reduce AM mem usage caused by storing TezEvents
> ---
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Bikas Saha
> Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
> TEZ-776.ondemand.6.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2064) SessionNotRunning Exception not thrown is all cases

2015-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361170#comment-14361170
 ] 

Hadoop QA commented on TEZ-2064:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12698474/TEZ-2064.2.patch
  against master revision a809f96.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/302//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/302//console

This message is automatically generated.

> SessionNotRunning Exception not thrown is all cases
> ---
>
> Key: TEZ-2064
> URL: https://issues.apache.org/jira/browse/TEZ-2064
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Attachments: TEZ-2064.1.patch, TEZ-2064.2.patch
>
>
> Hive handles SessionNotRunning during submitDAG() and restarts the tez-session
> if it receives one. In YHIVE-15, we did not receive that and the query 
> failed. In some scenarios the Application will fall out of the RM's knowledge 
> and a ApplicationNotFound exception is received instead.
> Here are my asks.
> 1. TezClient.submitDAG()/stop() should return SessionNotRunning exception if
> application is expired. Basically any API which currently returns
> SessionNotRunning should handle the app-not-found scenario.
> 2. It would help if TezClient.getAppMasterStatus() can return
> TezAppMasterStatus.SHUTDOWN if tez-session-application does not exist in RM.
> That way, as a precaution, applications could check before submitting DAG's.
> 3. I think it might be better if verifySessionStateForSubmission() checks the
> app Status every time instead of checking sessionStarted. I am not sure about
> side-effects, but will leave that to your decision.
> If 3 takes time, we can pursue that later. It would really help to get 1 & 2 
> in
> the next tez release, especially for busy grids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2064 PreCommit Build #302

2015-03-13 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2064
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/302/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2750 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12698474/TEZ-2064.2.patch
  against master revision a809f96.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/302//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/302//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
159038a8b30e33d20ce11bc080b3ea5f7c3959b0 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #297
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2590348 bytes
Compression is 4.8%
Took 0.86 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Assigned] (TEZ-2199) updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS

2015-03-13 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned TEZ-2199:


Assignee: Hitesh Shah

> updateLocalResourcesForInputSplits assumes wrongly that split data is on same 
> FS as the default FS
> --
>
> Key: TEZ-2199
> URL: https://issues.apache.org/jira/browse/TEZ-2199
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>
> Seen in a Windows Azure scenario:
> Caused by: java.io.FileNotFoundException: 
> hdfs://namenode:9000/hive/scratch/_tez_scratch_dir/split_Map_1/job.split: No 
> such file or directory.
>   at 
> org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1625)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.updateLocalResourcesForInputSplits(MRInputHelpers.java:639)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.configureMRInputWithLegacySplitGeneration(MRInputHelpers.java:115)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2199) updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS

2015-03-13 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-2199:


 Summary: updateLocalResourcesForInputSplits assumes wrongly that 
split data is on same FS as the default FS
 Key: TEZ-2199
 URL: https://issues.apache.org/jira/browse/TEZ-2199
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah


Seen in a Windows Azure scenario:

Caused by: java.io.FileNotFoundException: 
hdfs://namenode:9000/hive/scratch/_tez_scratch_dir/split_Map_1/job.split: No 
such file or directory.
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1625)
at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.updateLocalResourcesForInputSplits(MRInputHelpers.java:639)
at 
org.apache.tez.mapreduce.hadoop.MRInputHelpers.configureMRInputWithLegacySplitGeneration(MRInputHelpers.java:115)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2064) SessionNotRunning Exception not thrown is all cases

2015-03-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361102#comment-14361102
 ] 

Hitesh Shah commented on TEZ-2064:
--

Triggered pre-commit build.

> SessionNotRunning Exception not thrown is all cases
> ---
>
> Key: TEZ-2064
> URL: https://issues.apache.org/jira/browse/TEZ-2064
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Attachments: TEZ-2064.1.patch, TEZ-2064.2.patch
>
>
> Hive handles SessionNotRunning during submitDAG() and restarts the tez-session
> if it receives one. In YHIVE-15, we did not receive that and the query 
> failed. In some scenarios the Application will fall out of the RM's knowledge 
> and a ApplicationNotFound exception is received instead.
> Here are my asks.
> 1. TezClient.submitDAG()/stop() should return SessionNotRunning exception if
> application is expired. Basically any API which currently returns
> SessionNotRunning should handle the app-not-found scenario.
> 2. It would help if TezClient.getAppMasterStatus() can return
> TezAppMasterStatus.SHUTDOWN if tez-session-application does not exist in RM.
> That way, as a precaution, applications could check before submitting DAG's.
> 3. I think it might be better if verifySessionStateForSubmission() checks the
> app Status every time instead of checking sessionStarted. I am not sure about
> side-effects, but will leave that to your decision.
> If 3 takes time, we can pursue that later. It would really help to get 1 & 2 
> in
> the next tez release, especially for busy grids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2021) Tez tool to analyze shuffle performance in large clusters by mining task logs

2015-03-13 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361020#comment-14361020
 ] 

Jonathan Eagles commented on TEZ-2021:
--

haven't seen any recent updates to this ticket. Is this tool in good shape?

> Tez tool to analyze shuffle performance in large clusters by mining task logs
> -
>
> Key: TEZ-2021
> URL: https://issues.apache.org/jira/browse/TEZ-2021
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2021.1.patch, TEZ-2021.2.patch, 
> avg_time_Taken_after_fix.png, avg_time_taken_to_download.png, 
> no_of_times_contacted.png, total_data_transferred.png
>
>
> Tez tool to analyze shuffle performance in large clusters by mining task 
> logs. Provide an easier way to visualize (heat chart) and identify bad nodes 
> in large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2064) SessionNotRunning Exception not thrown is all cases

2015-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360861#comment-14360861
 ] 

Hadoop QA commented on TEZ-2064:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12698474/TEZ-2064.2.patch
  against master revision a809f96.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestAMRecovery

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/301//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/301//console

This message is automatically generated.

> SessionNotRunning Exception not thrown is all cases
> ---
>
> Key: TEZ-2064
> URL: https://issues.apache.org/jira/browse/TEZ-2064
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Attachments: TEZ-2064.1.patch, TEZ-2064.2.patch
>
>
> Hive handles SessionNotRunning during submitDAG() and restarts the tez-session
> if it receives one. In YHIVE-15, we did not receive that and the query 
> failed. In some scenarios the Application will fall out of the RM's knowledge 
> and a ApplicationNotFound exception is received instead.
> Here are my asks.
> 1. TezClient.submitDAG()/stop() should return SessionNotRunning exception if
> application is expired. Basically any API which currently returns
> SessionNotRunning should handle the app-not-found scenario.
> 2. It would help if TezClient.getAppMasterStatus() can return
> TezAppMasterStatus.SHUTDOWN if tez-session-application does not exist in RM.
> That way, as a precaution, applications could check before submitting DAG's.
> 3. I think it might be better if verifySessionStateForSubmission() checks the
> app Status every time instead of checking sessionStarted. I am not sure about
> side-effects, but will leave that to your decision.
> If 3 takes time, we can pursue that later. It would really help to get 1 & 2 
> in
> the next tez release, especially for busy grids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2064 PreCommit Build #301

2015-03-13 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2064
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/301/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2536 lines...]


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12698474/TEZ-2064.2.patch
  against master revision a809f96.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestAMRecovery

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/301//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/301//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
cda39fff49b136e145b567c990028cad4831b8fc logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #297
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2588594 bytes
Compression is 4.8%
Took 1.4 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_Broadcast

Error Message:
File does not exist: 
/user/jenkins/target/org.apache.tez.test.TestAMRecovery-tmpDir/14711/.tez/application_1426269594468_0007/recovery/2/dag_1426269594468_0007_1.recovery
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
 at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
 at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
 at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)


Stack Trace:
java.io.FileNotFoundException: File does not exist: 
/user/jenkins/target/org.apache.tez.test.TestAMRecovery-tmpDir/14711/.tez/application_1426269594468_0007/recovery/2/dag_1426269594468_0007_1.recovery
at 
org.apache.hadoop.h

[jira] [Commented] (TEZ-2191) Simulation improvements to MockDAGAppMaster

2015-03-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360798#comment-14360798
 ] 

Bikas Saha commented on TEZ-2191:
-

Thanks! Yes. They are right now there because I pulled the code from the memory 
events testing patch. When that test goes in then these will be used. Yes, the 
accuracy is intentional because storing it in ms often leads to 0 because the 
numbers are small per invocation.

> Simulation improvements to MockDAGAppMaster
> ---
>
> Key: TEZ-2191
> URL: https://issues.apache.org/jira/browse/TEZ-2191
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: TEZ-2191.1.patch, TEZ-2191.2.patch, TEZ-2191.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2064 PreCommit Build #300

2015-03-13 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2064
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/300/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by user jeagles
Building remotely on H7 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs) in 
workspace /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://git-wip-us.apache.org/repos/asf/tez.git 
 > # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/tez.git
 > git --version # timeout=10
 > git fetch --tags --progress https://git-wip-us.apache.org/repos/asf/tez.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision a809f96c6e6c7bfe8f683980713bff5bfe373419 
(refs/remotes/origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a809f96c6e6c7bfe8f683980713bff5bfe373419
 > git rev-list 55d7fce0608506543eb6bbf53177b16c7f017e5b # timeout=10
No emails were triggered.
[PreCommit-TEZ-Build] $ /bin/bash /tmp/hudson7683334766425930884.sh
Running in Jenkins mode


==
==
Testing patch for TEZ-2064.
==
==


HEAD is now at a809f96 TEZ-2189. Tez UI live AM tracking url only works for 
localhost addresses (jeagles)
Previous HEAD position was a809f96... TEZ-2189. Tez UI live AM tracking url 
only works for localhost addresses (jeagles)
Switched to branch 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to a809f96c6e6c7bfe8f683980713bff5bfe373419.
TEZ-2064 is not "Patch Available".  Exiting.


==
==
Finished build.
==
==


Archiving artifacts
ERROR: No artifacts found that match the file pattern "patchprocess/*.*". 
Configuration error?
ERROR: ?patchprocess/*.*? doesn?t match anything, but ?*.*? does. Perhaps 
that?s what you mean?
Build step 'Archive the artifacts' changed build result to FAILURE
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-2189) Tez UI live AM tracking url only works for localhost addresses

2015-03-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360639#comment-14360639
 ] 

Hitesh Shah commented on TEZ-2189:
--

+1. Test failure is unrelated.

> Tez UI live AM tracking url only works for localhost addresses
> --
>
> Key: TEZ-2189
> URL: https://issues.apache.org/jira/browse/TEZ-2189
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2189.1.patch, TEZ-2189.2.patch, TEZ-2189.3.patch, 
> TEZ-2189.4.patch, TEZ-2189.5.patch, TEZ-2189.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2189 PreCommit Build #299

2015-03-13 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2189
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/299/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 1848 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12704452/TEZ-2189.6.patch
  against master revision 55d7fce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.client.TestTezClient

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/299//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/299//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
d407794a7911272faa4c39072feea1b54ea0d853 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #297
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2534881 bytes
Compression is 4.9%
Took 1.2 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
1 tests failed.
REGRESSION:  org.apache.tez.client.TestTezClient.testTezclientSession

Error Message:
test timed out after 5000 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 5000 milliseconds
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at 
java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:145)
at java.net.DatagramSocket.receive(DatagramSocket.java:786)
at com.sun.jndi.dns.DnsClient.doUdpQuery(DnsClient.java:416)
at com.sun.jndi.dns.DnsClient.query(DnsClient.java:210)
at com.sun.jndi.dns.Resolver.query(Resolver.java:81)
at com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:430)
at 
com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:231)
at 
com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:139)
at 
com.sun.jndi.toolkit.url.GenericURLDirContext.getAttributes(GenericURLDirContext.java:103)
at 
sun.security.krb5.KrbServiceLocator.getKerberosService(KrbServiceLocator.java:87)
at sun.security.krb5.Config.checkRealm(Config.java:1295)
at sun.security.krb5.Config.getRealmFromDNS(Config.java:1268)
at sun.security.krb5.Config.getDefaultRealm(Config.java:1162)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:84)
at 
org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:86)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:261)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(

[jira] [Commented] (TEZ-2189) Tez UI live AM tracking url only works for localhost addresses

2015-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360598#comment-14360598
 ] 

Hadoop QA commented on TEZ-2189:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12704452/TEZ-2189.6.patch
  against master revision 55d7fce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.client.TestTezClient

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/299//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/299//console

This message is automatically generated.

> Tez UI live AM tracking url only works for localhost addresses
> --
>
> Key: TEZ-2189
> URL: https://issues.apache.org/jira/browse/TEZ-2189
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2189.1.patch, TEZ-2189.2.patch, TEZ-2189.3.patch, 
> TEZ-2189.4.patch, TEZ-2189.5.patch, TEZ-2189.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name

2015-03-13 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360585#comment-14360585
 ] 

Sreenath Somarajapuram commented on TEZ-2061:
-

+1 LGTM

> Tez UI: vertex id column and filter on tasks page should be changed to vertex 
> name
> --
>
> Key: TEZ-2061
> URL: https://issues.apache.org/jira/browse/TEZ-2061
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-2061.1.patch, TEZ-2061.2.patch
>
>
> VertexId search box is not really useful unless one types in the whole vertex 
> id. At some point later, vertex name might be a better option. May need 
> backend changes or could be done on the UI with an additional call to convert 
> name to id from the dag info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2189) Tez UI live AM tracking url only works for localhost addresses

2015-03-13 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-2189:
-
Attachment: TEZ-2189.6.patch

[~hitesh], addressed the https issue and added a test case for missing scheme.

> Tez UI live AM tracking url only works for localhost addresses
> --
>
> Key: TEZ-2189
> URL: https://issues.apache.org/jira/browse/TEZ-2189
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2189.1.patch, TEZ-2189.2.patch, TEZ-2189.3.patch, 
> TEZ-2189.4.patch, TEZ-2189.5.patch, TEZ-2189.6.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2061 PreCommit Build #298

2015-03-13 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2061
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/298/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2752 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12704439/TEZ-2061.2.patch
  against master revision 55d7fce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/298//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/298//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
d57312d264b5cf6343f9c29fc172c2782c539f91 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #297
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2530589 bytes
Compression is 7.2%
Took 0.97 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name

2015-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360517#comment-14360517
 ] 

Hadoop QA commented on TEZ-2061:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12704439/TEZ-2061.2.patch
  against master revision 55d7fce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/298//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/298//console

This message is automatically generated.

> Tez UI: vertex id column and filter on tasks page should be changed to vertex 
> name
> --
>
> Key: TEZ-2061
> URL: https://issues.apache.org/jira/browse/TEZ-2061
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-2061.1.patch, TEZ-2061.2.patch
>
>
> VertexId search box is not really useful unless one types in the whole vertex 
> id. At some point later, vertex name might be a better option. May need 
> backend changes or could be done on the UI with an additional call to convert 
> name to id from the dag info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name

2015-03-13 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2061:
--
Attachment: TEZ-2061.2.patch

addressed comments 

> Tez UI: vertex id column and filter on tasks page should be changed to vertex 
> name
> --
>
> Key: TEZ-2061
> URL: https://issues.apache.org/jira/browse/TEZ-2061
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-2061.1.patch, TEZ-2061.2.patch
>
>
> VertexId search box is not really useful unless one types in the whole vertex 
> id. At some point later, vertex name might be a better option. May need 
> backend changes or could be done on the UI with an additional call to convert 
> name to id from the dag info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.

2015-03-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360423#comment-14360423
 ] 

André Kelpe commented on TEZ-160:
-

Could the sleep period be made configurable until this is fixed correctly? We 
have a test suite with a few thousand dags and waiting 5 extra seconds for 
every one of them adds a lot of wall-clock time.

> Remove 5 second sleep at the end of AM completion.
> --
>
> Key: TEZ-160
> URL: https://issues.apache.org/jira/browse/TEZ-160
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Siddharth Seth
>  Labels: TEZ-0.2.0
>
> ClientServiceDelegate/DAGClient doesn't seem to be getting job completion 
> status from the AM after job completion. It, instead, always relies on the RM 
> for this information. The information returned by the AM should be used while 
> it's available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name

2015-03-13 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360418#comment-14360418
 ] 

Sreenath Somarajapuram commented on TEZ-2061:
-

Please add vertex name to tasks_controller.js also.

> Tez UI: vertex id column and filter on tasks page should be changed to vertex 
> name
> --
>
> Key: TEZ-2061
> URL: https://issues.apache.org/jira/browse/TEZ-2061
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-2061.1.patch
>
>
> VertexId search box is not really useful unless one types in the whole vertex 
> id. At some point later, vertex name might be a better option. May need 
> backend changes or could be done on the UI with an additional call to convert 
> name to id from the dag info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2189) Tez UI live AM tracking url only works for localhost addresses

2015-03-13 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360295#comment-14360295
 ] 

Hitesh Shah commented on TEZ-2189:
--

Minor nit:

{code}
if (!historyUrl.isEmpty() && !historyUrl.startsWith("http://";)) {
{code}
  - above doesn't handle https 
  - we should either just check startsWith "http" instead of "http://"; or 
convert to URI, check for presence/absence of a scheme before prefixing http as 
a default? 

Future jira:
  - AM webapp tracking url does not account for running with https enabled. We 
hardcode the tracking url to use http.



> Tez UI live AM tracking url only works for localhost addresses
> --
>
> Key: TEZ-2189
> URL: https://issues.apache.org/jira/browse/TEZ-2189
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2189.1.patch, TEZ-2189.2.patch, TEZ-2189.3.patch, 
> TEZ-2189.4.patch, TEZ-2189.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name

2015-03-13 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2061:
--
Attachment: TEZ-2061.1.patch

* changed the column for task and task attempts on dag page to show vertex name 
instead of vertex id.
* changed search to search by name instead of id.

[~Sreenath] can you review?

> Tez UI: vertex id column and filter on tasks page should be changed to vertex 
> name
> --
>
> Key: TEZ-2061
> URL: https://issues.apache.org/jira/browse/TEZ-2061
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Prakash Ramachandran
>Assignee: Prakash Ramachandran
> Attachments: TEZ-2061.1.patch
>
>
> VertexId search box is not really useful unless one types in the whole vertex 
> id. At some point later, vertex name might be a better option. May need 
> backend changes or could be done on the UI with an additional call to convert 
> name to id from the dag info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2198) Fix sorter spill counts

2015-03-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360173#comment-14360173
 ] 

Gopal V commented on TEZ-2198:
--

That exact update means that a clear recommendation can be made on whether to 
use this optimization or not by simply checking the ADDITIONAL_SPILL_COUNT & 
once it is active ADDITIONAL_SPILL_COUNT will always be zero.

That makes it easy to check whether pipelined-shuffle is active & to predict 
whether it adds any benefit for a given case.

> Fix sorter spill counts
> ---
>
> Key: TEZ-2198
> URL: https://issues.apache.org/jira/browse/TEZ-2198
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>
> Prior to pipelined shuffle, tez merged all spilled data into a single file.  
> This ended up creating one index file and one output file. In this context, 
> TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional 
> spills and there was no counter needed to track the number of merges.
> With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT 
> would be misleading, as these spills are direct output files which are 
> consumed by the consumers.
> It would be good to have the following 
> - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task 
> to generate the final merged output
> - TOTAL_SPILLS: represents the total number of shuffle directories (index + 
> output files) that got created at the end of processing.
> For e.g, Assume sorter generated 5 spills in an attempt
> Without pipelining:
> ==
> ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting
> TOTAL_SPILLS = 1 <-- Final merged output
> With pipelining:
> 
> ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting
> TOTAL_SPILLS = 0 <--- No final output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2198) Fix sorter spill counts

2015-03-13 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360170#comment-14360170
 ] 

Gopal V commented on TEZ-2198:
--

[~rajesh.balamohan]: The example seems to not match the description.

It should be

With pipelining :
===
ADDITIONAL_SPILL_COUNT = 0 <-- Additional spills involved in sorting
TOTAL_SPILL_COUNT = 5 <--- All spills are in task are final

The easier thing to remember is that ADDITIONAL_SPILL_COUNT includes only 
spills which are read by the same task that produced the spill, because they 
are additional read IO in the output phase.

The TOTAL_SPILL_COUNT is the number of files being offered via shuffle-handler 
(indirectly related to the number of DME events & shuffle fetcher requests).

> Fix sorter spill counts
> ---
>
> Key: TEZ-2198
> URL: https://issues.apache.org/jira/browse/TEZ-2198
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>
> Prior to pipelined shuffle, tez merged all spilled data into a single file.  
> This ended up creating one index file and one output file. In this context, 
> TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional 
> spills and there was no counter needed to track the number of merges.
> With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT 
> would be misleading, as these spills are direct output files which are 
> consumed by the consumers.
> It would be good to have the following 
> - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task 
> to generate the final merged output
> - TOTAL_SPILLS: represents the total number of shuffle directories (index + 
> output files) that got created at the end of processing.
> For e.g, Assume sorter generated 5 spills in an attempt
> Without pipelining:
> ==
> ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting
> TOTAL_SPILLS = 1 <-- Final merged output
> With pipelining:
> 
> ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting
> TOTAL_SPILLS = 0 <--- No final output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-03-13 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360111#comment-14360111
 ] 

Tsuyoshi Ozawa commented on TEZ-1421:
-

I've investigated this deeply: this bug happens when TEZ_RUNTIME_COMBINER_CLASS 
is set, but MRJobConfig.COMBINE_CLASS_ATTR or "mapred.combiner.class" is null. 
I'll check code of MRHelpers.

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2191) Simulation improvements to MockDAGAppMaster

2015-03-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360106#comment-14360106
 ] 

Rajesh Balamohan commented on TEZ-2191:
---

+1. lgtm.

- heartbeatTime, heartbeatCPU times are not used in the testcases. Is the 
intention to make use of it on need basis later? Also, it is in microseconds 
accuracy. Is that intentional?


> Simulation improvements to MockDAGAppMaster
> ---
>
> Key: TEZ-2191
> URL: https://issues.apache.org/jira/browse/TEZ-2191
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: TEZ-2191.1.patch, TEZ-2191.2.patch, TEZ-2191.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-03-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360078#comment-14360078
 ] 

Siddharth Seth commented on TEZ-776:


Please see my first comment on the document posted - questioning the CPU 
efficiency of the ODR approach. *This is converting, what is primarily a MXN 
memory problem, into a MXN CPU problem.* That’s an approach, which I wouldn’t 
even consider, except for the fact - that we already have an (unnecessary) MXN 
CPU issue for ScatterGather edges - which I didn’t realize earlier - and that 
single case becomes better in terms of memory. For other edge types - they in 
fact move from a < MXN memory/CPU issue to a guaranteed MXN CPU issue. This 
forces CPU inefficiency on ALL edge types.
Introducing a N^2 algorithm (where N is non-trivial), when a more optimal 
approach exists, is not the right way to go. The fact that routing is a 
fraction of AM CPU, to me, says that we have other avenues to improve CPU 
utilization along with memory, rather than using this as justification to put 
in an inefficient algorithm. There's numbers posted previously which show CPU 
efficiency improving marginally or remaining roughly the same for 
ScatterGather, but degrading quite a bit for OneToOne.
If there were no API changes involved - this can be iterated upon more easily, 
since it does improve things for the most commonly used case and users wouldn't 
know the difference. However, API changes are involved here - which are 
avoidable, and are also required in the approach of moving events into the 
edge. Hence my previous comment and suggestion.

bq. some yet to be built concept. Other approaches could be implemented in 
full, tested, profiled and verified
I’m at a loss here. Are you suggesting that we discuss options based off of 
patches ? Surely we can reason about and discuss alternate approaches without 
code changes being in place ? I'm sure it makes sense for you to go ahead and 
iterate on the approach, test it etc. However, if there's alternates being 
discussed from day1, which haven't been fully discussed - there is a chance 
that the final approach and patch will need to change.

> Reduce AM mem usage caused by storing TezEvents
> ---
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Bikas Saha
> Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
> TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
> TEZ-776.ondemand.6.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2198) Fix sorter spill counts

2015-03-13 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-2198:
-

 Summary: Fix sorter spill counts
 Key: TEZ-2198
 URL: https://issues.apache.org/jira/browse/TEZ-2198
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan


Prior to pipelined shuffle, tez merged all spilled data into a single file.  
This ended up creating one index file and one output file. In this context, 
TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional 
spills and there was no counter needed to track the number of merges.

With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT 
would be misleading, as these spills are direct output files which are consumed 
by the consumers.

It would be good to have the following 
- ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task to 
generate the final merged output
- TOTAL_SPILLS: represents the total number of shuffle directories (index + 
output files) that got created at the end of processing.

For e.g, Assume sorter generated 5 spills in an attempt
Without pipelining:
==
ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting
TOTAL_SPILLS = 1 <-- Final merged output

With pipelining:

ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting
TOTAL_SPILLS = 0 <--- No final output





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-1909 PreCommit Build #297

2015-03-13 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-1909
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/297/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2755 lines...]

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12704361/TEZ-1909-2.patch
  against master revision 55d7fce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/297//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/297//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
53b683416b271c75c5cf70cc6d1cb7b38a777a16 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #296
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2544544 bytes
Compression is 7.2%
Took 10 min
Description set: TEZ-1909
Recording test results
Email was triggered for: Success
Sending email for trigger: Success
ERROR: H0 is offline; cannot locate JDK 1.7 (latest)
ERROR: H0 is offline; cannot locate JDK 1.7 (latest)
ERROR: H0 is offline; cannot locate JDK 1.7 (latest)
ERROR: H0 is offline; cannot locate JDK 1.7 (latest)




###
## FAILED TESTS (if any) 
##
All tests passed