[jira] [Updated] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill

2016-04-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-3161:

Attachment: TEZ-3161.5.txt

Updated patch with the test fixed.

> Allow task to report different kinds of errors - fatal / kill
> -
>
> Key: TEZ-3161
> URL: https://issues.apache.org/jira/browse/TEZ-3161
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt, 
> TEZ-3161.4.txt, TEZ-3161.5.txt
>
>
> In some cases, task failures will be the same across all attempts - e.g. 
> exceeding memory utilization on an operation. In this case, there's no point 
> in running another attempt of the same task.
> There's other cases where a task may want to mark itself as KILLED - i.e. a 
> temporary error. An example of this is pipelined shuffle.
> Tez should allow both operations.
> cc [~vikram.dixit], [~rajesh.balamohan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill

2016-04-02 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223070#comment-15223070
 ] 

TezQA commented on TEZ-3161:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12796697/TEZ-3161.4.txt
  against master revision 0c7e1c5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 35 javac 
compiler warnings (more than the master's current 33 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance
  
org.apache.tez.dag.history.logging.ats.TestHistoryEventTimelineConversion
  org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1604//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1604//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1604//console

This message is automatically generated.

> Allow task to report different kinds of errors - fatal / kill
> -
>
> Key: TEZ-3161
> URL: https://issues.apache.org/jira/browse/TEZ-3161
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt, 
> TEZ-3161.4.txt
>
>
> In some cases, task failures will be the same across all attempts - e.g. 
> exceeding memory utilization on an operation. In this case, there's no point 
> in running another attempt of the same task.
> There's other cases where a task may want to mark itself as KILLED - i.e. a 
> temporary error. An example of this is pipelined shuffle.
> Tez should allow both operations.
> cc [~vikram.dixit], [~rajesh.balamohan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3161 PreCommit Build #1604

2016-04-02 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3161
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1604/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4784 lines...]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-dag
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12796697/TEZ-3161.4.txt
  against master revision 0c7e1c5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 35 javac 
compiler warnings (more than the master's current 33 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance
  
org.apache.tez.dag.history.logging.ats.TestHistoryEventTimelineConversion
  org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1604//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1604//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1604//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
ff8e880f9cab8ceaa402fc136f0abb85c2cea747 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
9 tests failed.
FAILED:  
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources

Error Message:

Wanted but not invoked:
taskSchedulerManagerForTest.taskAllocated(
0,
Mock for TA attempt_0_0001_0_01_04_1,
,
Container: [ContainerId: container_1_0001_01_01, NodeId: host1:0, 
NodeHttpAddress: host1:0, Resource: , Priority: 1, 
Token: null, ]
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1272)

However, there were other interactions with this mock:
taskSchedulerManagerForTest.init(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143)

taskSchedulerManagerForTest.setConfig(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143)

taskSchedulerManagerForTest.serviceInit(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143)

taskSchedulerManagerForTest.start();
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144)

taskSchedulerManagerForTest.serviceStart();
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144)

taskSchedulerManagerForTest.instantiateSchedulers(
"host",
0,
"",
Mock for AppContext, hashCode: 833038353
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144)

taskSchedulerManagerForTest.getContainerSignatureMatcher();
-> at 

[jira] [Commented] (TEZ-3077) TezClient.waitTillReady should support timeout

2016-04-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223058#comment-15223058
 ] 

Siddharth Seth commented on TEZ-3077:
-

Thanks for the updated patch [~kshukla]. This looks a lot better in terms of 
the APIs. Some comments.

- Can the existing preWarm method be changed to invoke the new one with a 
timeout of 0 ? Similar to what has been done for the existing waitTillReady 
method.
- In waitTillReady
{code}
+  if ((timeout > 0) &&
+  Time.monotonicNow() - startTime >= timeout) {
+return false;
{code}
This check should be after checking the updated status to be READY. Otherwise 
we could end up timing out in the last iteration even if the state did change 
to READY.

{code}long sleepTime = (SLEEP_FOR_READY > timeout) ?
   SLEEP_FOR_READY - timeout : SLEEP_FOR_READY;{code}
Should this be {code}
long sleepTime = (SLEEP_FOR_READY > timeout) ?
   timeout : SLEEP_FOR_READY;
{code}
Even better would be to sleep for whatever time is actually left.
{code}
  long now = Time.monotonicNow();
  if (startTime + timeout > now) {
long sleepTime = Math.min(SLEEP_FOR_READY, startTime + timeout - now);
Thread.sleep(sleepTime);
  } else {
return false;
  }
{code}

On the unit test, could you please look at testStopRetriesUntilTimeout - and 
see if a test can be added along these lines. i.e. it actually validates that 
attempts were made to get the appReport, and a final timeout - rather than 
returning success.


> TezClient.waitTillReady should support timeout
> --
>
> Key: TEZ-3077
> URL: https://issues.apache.org/jira/browse/TEZ-3077
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Kuhu Shukla
> Attachments: TEZ-3077.001.patch, TEZ-3077.002.patch
>
>
> Also preWarm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3177) Non-DAG events should use the session domain or no domain if the data does not need protection

2016-04-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223051#comment-15223051
 ] 

Siddharth Seth commented on TEZ-3177:
-

{code}
+  if (historyACLPolicyManager != null
+  && sessionDomainId != null && !sessionDomainId.isEmpty()
+  && domainId != null && !domainId.isEmpty()) {
+if 
(HistoryEventType.isDAGSpecificEvent(event.getHistoryEvent().getEventType())) {
   historyACLPolicyManager.updateTimelineEntityDomain(entities[i], 
domainId);
+} else {
+  historyACLPolicyManager.updateTimelineEntityDomain(entities[i], 
sessionDomainId);
 }
{code}

Dag specific domain id / session domain id - will both either be set, or unset 
? Do wen end up missing the domainId in some cases if both are unset. It may be 
better to flip the check to say - if(dagDomainType) - null check the 
dagDomainId, otherwise null check the sessionDomainId.

Rest looks good.


> Non-DAG events should use the session domain or no domain if the data does 
> not need protection 
> ---
>
> Key: TEZ-3177
> URL: https://issues.apache.org/jira/browse/TEZ-3177
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-3177.1.patch
>
>
> There have been issues noticed where when using dag specific domains, 
> container events get generated under different dags causing issues as they 
> are updated using different domains. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill

2016-04-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-3161:

Attachment: TEZ-3161.4.txt

Updated patch with the following changes.
- FailureType renamed to TaskFailureType
- Have retained the APIs introduced in the patch. The existing API is going to 
get confusing otherwise. Added specific javadocs on fatalError explaining the 
behaviour, along with deprecation. This seems like the least confusing to me.
- Marked killSlef as private
- Renamed unsuccessfulEnd to taskFailureType
- Added writing to history. Is there some place that ATS data is being read 
back as well ? I couldn't find that.
- Changed the TaskImpl log line to be easier to understand

bq. Wouldnt there be only one specific termination cause to indicate that the 
user-code told the framework to abort itself or kill itself?
The TaskAttemptEndReason is set based on which component reported the error - 
Input / Processor / Output - at least from the task. There's a bunch of other 
EndReasons which are independent of this. FailureType would now indicate the 
FailureType on top of whatever EndReason is set.

> Allow task to report different kinds of errors - fatal / kill
> -
>
> Key: TEZ-3161
> URL: https://issues.apache.org/jira/browse/TEZ-3161
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt, 
> TEZ-3161.4.txt
>
>
> In some cases, task failures will be the same across all attempts - e.g. 
> exceeding memory utilization on an operation. In this case, there's no point 
> in running another attempt of the same task.
> There's other cases where a task may want to mark itself as KILLED - i.e. a 
> temporary error. An example of this is pipelined shuffle.
> Tez should allow both operations.
> cc [~vikram.dixit], [~rajesh.balamohan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3182) linux superuser use maven compile bower always fail

2016-04-02 Thread shenxianqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222898#comment-15222898
 ] 

shenxianqiang commented on TEZ-3182:


Thanks Sreenath Somarajapuram.
My bad
If I set default bower-allow-root as a empty value.Is that OK?

> linux superuser use maven compile bower always fail
> ---
>
> Key: TEZ-3182
> URL: https://issues.apache.org/jira/browse/TEZ-3182
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 0.6.2, 0.8.2
> Environment: linux rh6
>Reporter: shenxianqiang
>Assignee: shenxianqiang
>Priority: Trivial
> Attachments: TEZ-3182.1.patch, TEZ-3182.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When I am root. Using 'mvn clean package -DskipTests=true' command always 
> fail.
> [INFO] --- exec-maven-plugin:1.3.2:exec (Bower install) @ tez-ui ---
> bower ESUDO Cannot be run with sudo
> Additional error details:
> Since bower is a user command, there is no need to execute it with superuser 
> permissions.
> If you're having permission errors when using bower without sudo, please 
> spend a few minutes learning more about how your system should work and make 
> any necessary repairs.
> http://www.joyent.com/blog/installing-node-and-npm
> https://gist.github.com/isaacs/579814
> You can however run a command with sudo using --allow-root option
> I have to modify pom.xml.Why not modify pom.xml in future?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)