[jira] [Updated] (TEZ-2401) Tez UI: All-dag page has duration keep counting for KILLED dag.

2015-05-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2401:
--
Summary: Tez UI: All-dag page has duration keep counting for KILLED dag.  
(was: All-dag page has duration keep counting for KILLED dag.)

 Tez UI: All-dag page has duration keep counting for KILLED dag.
 ---

 Key: TEZ-2401
 URL: https://issues.apache.org/jira/browse/TEZ-2401
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Assignee: Prakash Ramachandran
Priority: Critical
 Attachments: TEZ-2401.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2401) All-dag page has duration keep counting for KILLED dag.

2015-05-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2401:
--
Attachment: TEZ-2401.1.patch

trivial patch [~Sreenath] please review.

 All-dag page has duration keep counting for KILLED dag.
 ---

 Key: TEZ-2401
 URL: https://issues.apache.org/jira/browse/TEZ-2401
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Assignee: Prakash Ramachandran
Priority: Critical
 Attachments: TEZ-2401.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex

2015-05-04 Thread Sreenath Somarajapuram (JIRA)
Sreenath Somarajapuram created TEZ-2406:
---

 Summary: TEZ-UI: Display per-io counter columns in task and 
attempt pages under vertex
 Key: TEZ-2406
 URL: https://issues.apache.org/jira/browse/TEZ-2406
 Project: Apache Tez
  Issue Type: Bug
Reporter: Sreenath Somarajapuram


- We will auto-populate all the counter names including io counter names to the 
tasks (under a vertex) and task attempts (under task, vertex).
- To enable navigation the counter names will be searchable in the dropdown for 
the counter selection.
- Per-io counter names will not be stored in the personalization settings given 
they are dag / vertex specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent

2015-05-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2404:

Attachment: TEZ-2404-1.patch

 Handle DataMovementEvent before its TaskAttemptCompletedEvent
 -

 Key: TEZ-2404
 URL: https://issues.apache.org/jira/browse/TEZ-2404
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2404-1.patch


 TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it 
 would cause recovery issue. Recovery need that DataMovement event is handled 
 before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in 
 recovering and cause the its dependent tasks hang.
 2 Ways to fix this issue.
 1. Still route TaskAtttemptCompletedEvent in Vertex
 2. route DataMovementEvent before TaskAttemptCompeltedEvent in 
 TezTaskAttemptListener



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2405) PipelinedSorter can throw NPE with custom compartor

2015-05-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526314#comment-14526314
 ] 

Gopal V commented on TEZ-2405:
--

[~rajesh.balamohan]: the patch looks good - +1

But the code confusion remains - we have to investigate dropping the old MR 
InputBuffer impl which we can't fix anymore.

{code}
public class InputBuffer extends FilterInputStream {
...
 public void reset(byte[] input, int start, int length) {
  this.buf = input;
  this.count = start+length;
  this.pos = start;
...
}
public int getPosition() { return pos; }
public int getLength() { return count; }
{code}

This makes it obvious that InputBuffer.getLength() is not similar to any other 
getLength calls, but instead is a capacity parameter of unknown clarity (i.e 
the other areas of the byte[] array might be owned by other buffers).

Post 0.7.x, we can rewrite this codepath to avoid this particular anti-pattern, 
by dropping references to the old DataInputBuffer impl.

 PipelinedSorter can throw NPE with custom compartor
 ---

 Key: TEZ-2405
 URL: https://issues.apache.org/jira/browse/TEZ-2405
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
 Attachments: TEZ-2405.1.patch


 If custom comparators are used,  PipelinedSorter can throw NPE depending on 
 custom comparator implementations.
 {noformat}
 ], TaskAttempt 1 failed, info=[Error: Failure while running 
 task:java.lang.NullPointerException
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:837)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:767)
   at java.util.PriorityQueue.siftUpComparable(PriorityQueue.java:637)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:629)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.add(PipelinedSorter.java:996)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.next(PipelinedSorter.java:1065)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$PartitionFilter.next(PipelinedSorter.java:936)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:366)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:406)
   at 
 org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:183)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2404 PreCommit Build #609

2015-05-04 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2404
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/609/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 88 lines...]


==
==
Determining number of patched javac warnings.
==
==


/home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch  
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/patchJavacWarnings.txt
 21




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730126/TEZ-2404-1.patch
  against master revision f6ea0fb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/609//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
fb5338efc023373eccdb268ccffb4b5e279534c9 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #608
Archived 3 artifacts
Archive block size is 32768
Received 0 blocks and 760141 bytes
Compression is 0.0%
Took 0.67 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Created] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter

2015-05-04 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-2407:
-

 Summary: Drop references to the old DataInputBuffer impl in 
PipelinedSorter
 Key: TEZ-2407
 URL: https://issues.apache.org/jira/browse/TEZ-2407
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex

2015-05-04 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-2406:

Attachment: TEZ-2406.1.patch

[~pramachandran] Please help to get the patch in.

 TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
 -

 Key: TEZ-2406
 URL: https://issues.apache.org/jira/browse/TEZ-2406
 Project: Apache Tez
  Issue Type: Bug
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram
 Attachments: TEZ-2406.1.patch


 - We will auto-populate all the counter names including io counter names to 
 the tasks (under a vertex) and task attempts (under task, vertex).
 - To enable navigation the counter names will be searchable in the dropdown 
 for the counter selection.
 - Per-io counter names will not be stored in the personalization settings 
 given they are dag / vertex specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex

2015-05-04 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-2406:

Description: 
- We will auto-populate all the counter names including io counter names to the 
tasks (under a vertex) and task attempts (under task, vertex).
- To enable navigation the column names will be searchable in the pop-up for 
column selection.
- Per-io counter names will not be stored in the personalization settings given 
they are dag / vertex specific.

  was:
- We will auto-populate all the counter names including io counter names to the 
tasks (under a vertex) and task attempts (under task, vertex).
- To enable navigation the counter names will be searchable in the dropdown for 
the counter selection.
- Per-io counter names will not be stored in the personalization settings given 
they are dag / vertex specific.


 TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
 -

 Key: TEZ-2406
 URL: https://issues.apache.org/jira/browse/TEZ-2406
 Project: Apache Tez
  Issue Type: Bug
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram
 Attachments: TEZ-2406.1.patch


 - We will auto-populate all the counter names including io counter names to 
 the tasks (under a vertex) and task attempts (under task, vertex).
 - To enable navigation the column names will be searchable in the pop-up for 
 column selection.
 - Per-io counter names will not be stored in the personalization settings 
 given they are dag / vertex specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex

2015-05-04 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-2406:

Affects Version/s: 0.7.0

 TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
 -

 Key: TEZ-2406
 URL: https://issues.apache.org/jira/browse/TEZ-2406
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram
 Attachments: TEZ-2406.1.patch


 - We will auto-populate all the counter names including io counter names to 
 the tasks (under a vertex) and task attempts (under task, vertex).
 - To enable navigation the column names will be searchable in the pop-up for 
 column selection.
 - Per-io counter names will not be stored in the personalization settings 
 given they are dag / vertex specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2401) Tez UI: All-dag page has duration keep counting for KILLED dag.

2015-05-04 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526336#comment-14526336
 ] 

Sreenath Somarajapuram commented on TEZ-2401:
-

Do we have a purpose for
// unixtimestamp is in seconds. javascript expects milliseconds.
if (endTime  startTime || !!endTime) {
end = new Date().getTime();
}

 Tez UI: All-dag page has duration keep counting for KILLED dag.
 ---

 Key: TEZ-2401
 URL: https://issues.apache.org/jira/browse/TEZ-2401
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Assignee: Prakash Ramachandran
Priority: Critical
 Attachments: TEZ-2401.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2401) Tez UI: All-dag page has duration keep counting for KILLED dag.

2015-05-04 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526342#comment-14526342
 ] 

Prakash Ramachandran commented on TEZ-2401:
---

it has been changed to 
{code}
if (endTime  startTime) {
{code}
in the patch was more for getting current running time where applicable. ex 
formatDuration(startTime, -1) will give time till now.

 Tez UI: All-dag page has duration keep counting for KILLED dag.
 ---

 Key: TEZ-2401
 URL: https://issues.apache.org/jira/browse/TEZ-2401
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Assignee: Prakash Ramachandran
Priority: Critical
 Attachments: TEZ-2401.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex

2015-05-04 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526605#comment-14526605
 ] 

Prakash Ramachandran commented on TEZ-2406:
---

patch generally looks fine. 

* checkbox for select all should have a label, and also its positioned properly 
on chrome.
* the message columnSelectorMessage and the function to extract the names of 
per io counters can be shared across the views.
* also will it be possible to highlight (color?) the per-io counters in the 
selection box so that user is aware whch ones are they?

 TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
 -

 Key: TEZ-2406
 URL: https://issues.apache.org/jira/browse/TEZ-2406
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram
 Attachments: TEZ-2406.1.patch


 - We will auto-populate all the counter names including io counter names to 
 the tasks (under a vertex) and task attempts (under task, vertex).
 - To enable navigation the column names will be searchable in the pop-up for 
 column selection.
 - Per-io counter names will not be stored in the personalization settings 
 given they are dag / vertex specific.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2401) Tez UI: All-dag page has duration keep counting for KILLED dag.

2015-05-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2401:
--
Attachment: TEZ-2401.2.patch

thanks [~Sreenath] addressed review comments

 Tez UI: All-dag page has duration keep counting for KILLED dag.
 ---

 Key: TEZ-2401
 URL: https://issues.apache.org/jira/browse/TEZ-2401
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Assignee: Prakash Ramachandran
Priority: Critical
 Attachments: TEZ-2401.1.patch, TEZ-2401.2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag

2015-05-04 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2076:
--
Attachment: TEZ-2076.10.patch


ATSImportTool
- Fixed docs
- Fixed logging in case of exception
- Fixed x.y.z for version info
- Made the packaging as a fat jar.  (--atsAddress=http://atsServer:port 
can be provided in command line as optional parameter if needed. Otherwise, it 
would be picked up from $HADOOP_CONF_DIR location)
- Usage:
{noformat}
usage: java -cp 
$HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-SNAPSHOT-jar-with-dependencies.jar
 org.apache.tez.history.ATSImportTool
--atsAddress atsAddress Optional. ATS address (e.g 
http://clusterATSNode:8188)
--dagId dagId   DagId that needs to be downloaded
--downloadDir downloadDir   download directory where data needs to be 
downloaded
--helpprint help
{noformat}

 What happens when some of the data is downloaded but some fails to?
- This would require parsing of downloaded data (e.g, ATS goes down in the 
middle of download).  Currently this is not checked  would throw exception.  
However, we would get partial data (i.e as and when a batch is downloaded, it 
gets written to zip file). Not sure if we need a this feature to validate.  I 
believe exception should be good for v1.

 What happens if the tool is run when a dag is still in progress? Will it 
 give invalid data back? Should that case be handled by throwing an error or 
 just having the user warned as needed?
- Currently, if data is available (even partial in the case of running jobs) it 
would be downloaded. Is the suggestion not to download if job is in progress 
(e.g RUNNING, INITING, SUBMITTED)?

 Maybe BaseInfo and then use abstract class?
- Fixed. Renamed AbstractInfo to BaseInfo

 Should all info objects representing the data be moved to a package say 
 parser.datamodel ?
- Moved all info objects ot parser.datamodel.  Also created BaseParser which 
can link task, vertex, dag etc for reverse lookups.

 How is versioning being handled in the serialized zip structure? Also, why 
 json as compared to say a protobuf structure?
- No explicit version is maintained in zip structure. Adding tez-version be 
helpful?
- Moving back and forth from DAG--TaskAttempt and TaskAttempt--DAG can be 
complex in protobuf.  Hence the objects are maintained as POJO in-memory 
structure after parsing JSON.

 What if there are 100,000 attempts? or more? Does this require a large 
 memory footprint?
- No, zip file can have numerous number of small part files. Each of them can 
contain some amount of task, attempt, vertex, dag information. As and when the 
part file is parsed, the JSON object pertaining to that part file is released. 
So there wouldn't be much pressure during parsing. However, the DAG in-memory 
representation (POJO) can differ based on the size of of the jobs. I will post 
the memory details soon.

 Should serialized data be loaded on an demand basis? Or does the analyser 
 always take an initial hit to load all data into memory?
- It might be memory effecient, but would make it hard for analysis. For 
analysis, we would like to move back and forth from DAG--TaskAttempt and 
vice-versa.  This would call for all objects to be present in memory. 

 It seems like we have 2 data models. The runtime model and the analyser data 
 model. It is going to be hard to keep them in sync. Any suggestions on how 
 we can re-use a common model?
- No; ATS data is parsed and represented as in-memory POJOs via parser. 
Analyzer would work on the in-memory (read only) structures. Irrespective of 
any other changes in ATS, in-memory representations of DAG,Vertex, Task, 
TaskAttempts should not change. 

 getAbsoluteSubmitTime() - is there a non-absolute timestamp elsewhere? Maybe 
 simplify function names?
- Yes, getSubmitTime() would return the timing w.r.t to DAG start time.  This 
would be useful when drawing swimlane diagrams for instance. Renamed to 
getAbsStartTime() for now (any suggestions?)

 Could you clarify why most classes are marked public?
- All info objects would be public (evolving) as the analyzer code would rely 
on these in-memory objects.

 void setTaskInfo(TaskInfo taskInfo)
- As mentioned earlier, zip file can have arbitary number of part files. Each 
part file is parsed and an in-memory POJO is created. Before returning the 
final DAG (in-memory structure), we need to link task to attempts, vertex to 
DAG etc.  These links happen via these methods which are not publicly exposed. 

 it would be good to try the tool with invalid data, corrupt zip files, etc 
 to ensure that there is useful error messages.
- In case of corrupt file, it would throw exception. E.g
{noformat}
Exception in thread main org.apache.tez.dag.api.TezException: 
java.util.zip.ZipException: error in opening zip 

Failed: TEZ-2401 PreCommit Build #613

2015-05-04 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2401
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/613/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2779 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730188/TEZ-2401.2.patch
  against master revision c411e4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/613//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/613//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
67a235d89a770d60cab98d55b71a4022a84c1d8c logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #612
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2566260 bytes
Compression is 7.1%
Took 1.4 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag

2015-05-04 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526833#comment-14526833
 ] 

TezQA commented on TEZ-2076:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730195/TEZ-2076.10.patch
  against master revision c411e4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/614//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/614//console

This message is automatically generated.

 Tez framework to extract/analyze data stored in ATS for specific dag
 

 Key: TEZ-2076
 URL: https://issues.apache.org/jira/browse/TEZ-2076
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.2.patch, 
 TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch, 
 TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, TEZ-2076.WIP.2.patch, 
 TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch


 - Users should be able to download ATS data pertaining to a DAG from Tez-UI 
 (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
 - This can be plugged to an analyzer which parses the data, adds semantics 
 and provides an in-memory representation for further analysis.
 - This will enable to write different analyzer rules, which can be run on top 
 of this in-memory representation to come up with analysis on the DAG.
 - Results of this analyzer rules can be rendered on to UI (standalone webapp) 
 later point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526916#comment-14526916
 ] 

Bikas Saha commented on TEZ-2404:
-

TEZ-1897 is not enabled yet. So we dont have to fix this immediately. We can 
use the time to explore other solutions that dont involve routing the same 
event twice. E.g. when the task completes then it sends an event to its vertex 
so that the vertex can increment its completed task count. Can that be used to 
mark the successful attempt as done in the history logs by the vertex? 
Logically, from what I see, the vertex is using the task attempt completed 
event as a marker for the successful attempts history event completion, right? 
This approach may mean that an unsuccessful attempt will not have a completion 
marker. Will that be a problem? Maybe not, since we dont care about those 
attempts anyways. For work preserving AM restart we can discard these events if 
the running task has not reconnected with the AM. In the non-work-preserving AM 
restart case we can always discard these events.

 Handle DataMovementEvent before its TaskAttemptCompletedEvent
 -

 Key: TEZ-2404
 URL: https://issues.apache.org/jira/browse/TEZ-2404
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch


 TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it 
 would cause recovery issue. Recovery need that DataMovement event is handled 
 before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in 
 recovering and cause the its dependent tasks hang.
 2 Ways to fix this issue.
 1. Still route TaskAtttemptCompletedEvent in Vertex
 2. route DataMovementEvent before TaskAttemptCompeltedEvent in 
 TezTaskAttemptListener



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2379:
-
Attachment: TEZ-2379.2.patch

Attached patch with handling for killed attempt on failed/killed task states.

This seems safer as killed and failed are already terminal states. Handling 
killed at succeeded is already handled properly. 

[~bikassaha] please review 

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526921#comment-14526921
 ] 

Hitesh Shah commented on TEZ-2407:
--

Any reason why should not be targeted to 0.7.0 or a 0.7.x release?

 Drop references to the old DataInputBuffer impl in PipelinedSorter
 --

 Key: TEZ-2407
 URL: https://issues.apache.org/jira/browse/TEZ-2407
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2405) PipelinedSorter can throw NPE with custom compartor

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526922#comment-14526922
 ] 

Hitesh Shah commented on TEZ-2405:
--

Does this affect anyone using pipelinedsorter in 0.5 or 0.6? 

 PipelinedSorter can throw NPE with custom compartor
 ---

 Key: TEZ-2405
 URL: https://issues.apache.org/jira/browse/TEZ-2405
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
 Fix For: 0.7.0

 Attachments: TEZ-2405.1.patch


 If custom comparators are used,  PipelinedSorter can throw NPE depending on 
 custom comparator implementations.
 {noformat}
 ], TaskAttempt 1 failed, info=[Error: Failure while running 
 task:java.lang.NullPointerException
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:837)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:767)
   at java.util.PriorityQueue.siftUpComparable(PriorityQueue.java:637)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:629)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.add(PipelinedSorter.java:996)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.next(PipelinedSorter.java:1065)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$PartitionFilter.next(PipelinedSorter.java:936)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:366)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:406)
   at 
 org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:183)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2325) Route status update event directly to the attempt

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2325:
-
Priority: Major  (was: Critical)

 Route status update event directly to the attempt 
 --

 Key: TEZ-2325
 URL: https://issues.apache.org/jira/browse/TEZ-2325
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
 Fix For: 0.7.0

 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, 
 TEZ-2325.4.patch


 Today, all events from the attempt heartbeat are routed to the vertex. then 
 the vertex routes (if any) status update events to the attempt. This is 
 unnecessary and potentially creates out of order scenarios. We could route 
 the status update events directly to attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2325) Route status update event directly to the attempt

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2325:
-
Priority: Critical  (was: Major)

 Route status update event directly to the attempt 
 --

 Key: TEZ-2325
 URL: https://issues.apache.org/jira/browse/TEZ-2325
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
Priority: Critical
 Fix For: 0.7.0

 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, 
 TEZ-2325.4.patch


 Today, all events from the attempt heartbeat are routed to the vertex. then 
 the vertex routes (if any) status update events to the attempt. This is 
 unnecessary and potentially creates out of order scenarios. We could route 
 the status update events directly to attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526931#comment-14526931
 ] 

Hitesh Shah commented on TEZ-2404:
--

BUmping up priority as this means recovery is potentially broken. 

[~zjffdu] It looks like we need a recovery related test to ensure that data 
movements events are always stored before a task completion event.

 Handle DataMovementEvent before its TaskAttemptCompletedEvent
 -

 Key: TEZ-2404
 URL: https://issues.apache.org/jira/browse/TEZ-2404
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch


 TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it 
 would cause recovery issue. Recovery need that DataMovement event is handled 
 before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in 
 recovering and cause the its dependent tasks hang.
 2 Ways to fix this issue.
 1. Still route TaskAtttemptCompletedEvent in Vertex
 2. route DataMovementEvent before TaskAttemptCompeltedEvent in 
 TezTaskAttemptListener



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526931#comment-14526931
 ] 

Hitesh Shah edited comment on TEZ-2404 at 5/4/15 5:54 PM:
--

BUmping up priority as this means recovery is potentially broken. 

[~zjffdu] It looks like we need a recovery related test to ensure that all data 
movements events are always stored before a task completion event.


was (Author: hitesh):
BUmping up priority as this means recovery is potentially broken. 

[~zjffdu] It looks like we need a recovery related test to ensure that data 
movements events are always stored before a task completion event.

 Handle DataMovementEvent before its TaskAttemptCompletedEvent
 -

 Key: TEZ-2404
 URL: https://issues.apache.org/jira/browse/TEZ-2404
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch


 TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it 
 would cause recovery issue. Recovery need that DataMovement event is handled 
 before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in 
 recovering and cause the its dependent tasks hang.
 2 Ways to fix this issue.
 1. Still route TaskAtttemptCompletedEvent in Vertex
 2. route DataMovementEvent before TaskAttemptCompeltedEvent in 
 TezTaskAttemptListener



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526935#comment-14526935
 ] 

Bikas Saha commented on TEZ-2379:
-

lgtm pending jenkins. If possible, could you put a comment in the task impl 
state machine summarizing the other scenario where we could ignore attempt 
killed in the attempt if the attempt is succeeded. In case, we hit this issue 
in the future for some other scenario, that may provide some context to 
simplify the debugging.

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526952#comment-14526952
 ] 

Hitesh Shah commented on TEZ-2379:
--

Will update the final patch with the relevant note related to the kill 
transition via killUnfinishedAttempt race. 

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2379:
-
Attachment: TEZ-2379.3.patch

Final patch with doc comment. 

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527033#comment-14527033
 ] 

Siddharth Seth commented on TEZ-2379:
-

One thing to consider here is that the individual state machines should be 
complete in themselves, and should not make assumptions about other state 
machines. This makes them a lot easier to reason about (we aren't there yet 
though)
TaskImpl
- Already knows how to handle ATTEMPT_KILLED and ATTEMPT_FAILED in the SUCCESS 
state. It'll, however, error out in the FAILED or KILLED state - but there's 
nothing to be done there if these events are received.

TaskAttemptImpl
- If moving from an one 'external' state to another - should inform the Task, 
and let it deal with the state change.


 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2379 PreCommit Build #615

2015-05-04 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2379
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/615/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2584 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730216/TEZ-2379.2.patch
  against master revision c411e4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/615//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/615//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
005e3e67c11cbc11968bd2e985d4dadadc43f6bd logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #614
Archived 44 artifacts
Archive block size is 32768
Received 26 blocks and 1887404 bytes
Compression is 31.1%
Took 1.5 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
6 tests failed.
REGRESSION:  
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit

Error Message:
test timed out after 6 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:853)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:626)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy91.getDAGStatus(Unknown Source)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatusViaAM(DAGClientRPCImpl.java:175)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:94)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGClientImpl.java:350)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusInternal(DAGClientImpl.java:217)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:262)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:127)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:248)


REGRESSION:  
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices

Error Message:
test timed out after 6 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:853)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:626)
   

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527041#comment-14527041
 ] 

TezQA commented on TEZ-2379:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730216/TEZ-2379.2.patch
  against master revision c411e4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/615//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/615//console

This message is automatically generated.

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher

2015-05-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527068#comment-14527068
 ] 

Siddharth Seth commented on TEZ-1897:
-

Looks like this went in with concurrentDispatchers enabled. Can you please undo 
that bit.

 Create a concurrent version of AsyncDispatcher
 --

 Key: TEZ-1897
 URL: https://issues.apache.org/jira/browse/TEZ-1897
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Fix For: 0.7.0

 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, 
 TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch, TEZ-1897.7.patch, 
 TEZ-1897.8.patch


 Currently, it processes events on a single thread. For events that can be 
 executed in parallel, e.g. vertex manager events, allowing higher concurrency 
 may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527081#comment-14527081
 ] 

Bikas Saha commented on TEZ-1897:
-

Thanks for catching it. My bad. Fixed.
commit 5218f481dba2a26c3aa5dd8f69285ab9da419dd1
Author: Bikas Saha bi...@apache.org
Date:   Mon May 4 12:05:39 2015 -0700

TEZ-1897 addendum to turn off by default . Create a concurrent version of 
AsyncDispatcher (bikas)



 Create a concurrent version of AsyncDispatcher
 --

 Key: TEZ-1897
 URL: https://issues.apache.org/jira/browse/TEZ-1897
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Fix For: 0.7.0

 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, 
 TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch, TEZ-1897.7.patch, 
 TEZ-1897.8.patch


 Currently, it processes events on a single thread. For events that can be 
 executed in parallel, e.g. vertex manager events, allowing higher concurrency 
 may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2379 PreCommit Build #616

2015-05-04 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2379
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/616/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2584 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730222/TEZ-2379.3.patch
  against master revision c411e4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/616//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/616//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
cc2efeeb76b37c65ffb7373e0d2780bdd0d8ade5 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #614
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2540593 bytes
Compression is 7.2%
Took 1.8 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
6 tests failed.
FAILED:  org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit

Error Message:
test timed out after 6 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:853)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:626)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy91.getDAGStatus(Unknown Source)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatusViaAM(DAGClientRPCImpl.java:175)
at 
org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:94)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGClientImpl.java:350)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusInternal(DAGClientImpl.java:217)
at 
org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:262)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:127)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:248)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices

Error Message:
test timed out after 6 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:853)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:626)
at 

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527095#comment-14527095
 ] 

TezQA commented on TEZ-2379:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730222/TEZ-2379.3.patch
  against master revision c411e4e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/616//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/616//console

This message is automatically generated.

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527102#comment-14527102
 ] 

Hitesh Shah commented on TEZ-2379:
--

Re-ran TestFaultTolerance locally without any problems. Looks like it probably 
failed due to the concurrent AsyncDispatcher being turned on by default. 



 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter

2015-05-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527112#comment-14527112
 ] 

Gopal V commented on TEZ-2407:
--

This is code-cleanliness refactoring - this does not add performance or 
stability fixes.

I'm 90% done with my scale  stability testing of the new sorter, so late 
refactoring has the potential to only introduce bugs deep inside the sorter.

I don't have enough weeks of testing left on my end and all this might do is 
make code readable at best and break the sorters at worst.

We can retarget this for 0.7.x if you think there's enough QA weeks left to 
catch any late issues this might introduce.

 Drop references to the old DataInputBuffer impl in PipelinedSorter
 --

 Key: TEZ-2407
 URL: https://issues.apache.org/jira/browse/TEZ-2407
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2405) PipelinedSorter can throw NPE with custom compartor

2015-05-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527124#comment-14527124
 ] 

Gopal V commented on TEZ-2405:
--

[~hitesh]: nope, this was introduced during 0.7 release cycle in my WIP patch 
for TEZ-1593.

The TEZ-1593 issue was identified in 0.6 but was not fixed in 0.6 release cycle 
as we wanted to do core fixes at the beginning of a release cycle rather than 
at the end.

 PipelinedSorter can throw NPE with custom compartor
 ---

 Key: TEZ-2405
 URL: https://issues.apache.org/jira/browse/TEZ-2405
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
 Fix For: 0.7.0

 Attachments: TEZ-2405.1.patch


 If custom comparators are used,  PipelinedSorter can throw NPE depending on 
 custom comparator implementations.
 {noformat}
 ], TaskAttempt 1 failed, info=[Error: Failure while running 
 task:java.lang.NullPointerException
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:837)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:767)
   at java.util.PriorityQueue.siftUpComparable(PriorityQueue.java:637)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:629)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.add(PipelinedSorter.java:996)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.next(PipelinedSorter.java:1065)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$PartitionFilter.next(PipelinedSorter.java:936)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:366)
   at 
 org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:406)
   at 
 org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:183)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527677#comment-14527677
 ] 

Siddharth Seth commented on TEZ-776:


- Minor: {code}Target input indices. The number must match the number of 
events{code}
count / size of array may make this a little clearer. 'number' is a little 
vague.
- Precondition checks to verify this ?
- BroadcastEdgeManger - commonRouteMeta setup via prepareForRouting. Not sure 
access this structure at a later point is thread safe. This goes away anyway if 
Broadcast/OneToOne are left unchanged.
- Bunch of repeated code between OneToOne, Broadcast, ScatterGather etc in 
Edge.java. Looks like it's all the same (exploding the EventRouteMetadata)
- Not sure if the thread safety applies to ScatterGather as well. That seems to 
be making changes within a lock though. Seems fairly complicated, assuming 
that's all for caching and efficiency ?
- There's several methods on EdgeManagerPluginContextOnDemand which don't need 
to be implemented/extended (The method on EdgeManagerPluginContext should be 
sufficient). - e.g. initialize(), getContext, some of the routing methods

- I'm still concerned about the access to taskEvents (taskEvents.size() and 
taskEvents.get()). This is an array list getting populated in one thread, and 
accessed in 30 others without a lock. ArrayList isn't supposed to be thread 
safe afaik. Will let someone else chime in here.

On TEZ-2409. I think it'll be better to get that done here itself. It's 
probably 10 more lines, and removes the changes on Broadcast/OneToOne. 2409 
becomes a blocker for 0.7 anyway - and would end up reverting undoing changes 
made here. The overall functionality is already tested by the various jobs that 
we run.



 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, 
 TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, 
 TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
 TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
 TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
 With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
 Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
 with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527830#comment-14527830
 ] 

Hitesh Shah commented on TEZ-2221:
--

This implies that oA ( or oB ) cannot belong to 2 different vertex groups and 
therefore the check currently implemented probably needs to be changed to 
account for this and not be based on vertex members of the group. 

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527804#comment-14527804
 ] 

Bikas Saha commented on TEZ-2221:
-

The commit behavior is different. Only the participating outputs of a vertex 
are committed when a vertex group commits. A vertex can be part of 2 vertex 
groups A and B with outputs oA and oB for each group respectively. oA is 
committed when A finishes and oB is committed when oB is committed.

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527830#comment-14527830
 ] 

Hitesh Shah edited comment on TEZ-2221 at 5/5/15 3:06 AM:
--

This implies that oA ( or oB ) cannot belong to 2 different vertex groups and 
therefore the check currently implemented probably needs to be changed to 
account for this and not be based on vertex members of the group. 

[~bikassaha] [~zjffdu] if the above is correct, it seems that we should revert 
this commit? Agree?


was (Author: hitesh):
This implies that oA ( or oB ) cannot belong to 2 different vertex groups and 
therefore the check currently implemented probably needs to be changed to 
account for this and not be based on vertex members of the group. 

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527869#comment-14527869
 ] 

Jeff Zhang commented on TEZ-2221:
-

I think this is must to have to disallow
{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_1, v2,v3);
{code}
and this is nice to have to disallow for avoiding any conflict between the 2 
vertex group with same members. Although currently there's no conflicts, 
VertexGroup#addDataSink is a potential one if adding same output to the 2 
vertex group with same members, but the conflict will be detected by 
Vertex#addAdditionalDataSink)
{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_2, v1,v2);
{code}

Since case 1 (must to have) impact the pig and pig don't use case 2, why not 
keep this patch ?




 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2410) VertexGroupCommitFinishedEvent is not logged correctly

2015-05-04 Thread Jeff Zhang (JIRA)
Jeff Zhang created TEZ-2410:
---

 Summary: VertexGroupCommitFinishedEvent is not logged correctly
 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527690#comment-14527690
 ] 

TezQA commented on TEZ-2408:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730285/TEZ-2408.1.patch
  against master revision e762a35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 161 javac 
compiler warnings (more than the master's current 156 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/617//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/617//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/617//console

This message is automatically generated.

 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 
 ---

 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Attachments: TEZ-2408.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2408 PreCommit Build #617

2015-05-04 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2408
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/617/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2783 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730285/TEZ-2408.1.patch
  against master revision e762a35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 161 javac 
compiler warnings (more than the master's current 156 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/617//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/617//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/617//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
d2600b3b33265b486e8394a6a086b16465d0ed8f logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #614
Archived 45 artifacts
Archive block size is 32768
Received 4 blocks and 2626264 bytes
Compression is 4.8%
Took 1.5 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527692#comment-14527692
 ] 

Hitesh Shah commented on TEZ-2221:
--

I guess the question boils down to what the behavior should be. When a vertex 
group is committed, each vertex in it is committed. If the vertex also belongs 
to another group, what happens? Should a vertex be allowed to belong to 2 
vertex groups? If yes, how should its commit be handled? The above check were 
to ensure some bits of verification for this case but probably need to be 
enhanced for more stringent checks. 



 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527694#comment-14527694
 ] 

Hitesh Shah commented on TEZ-2408:
--

Committing shortly. Thanks for the review [~bikassaha]. New warnings are due to 
the use of deprecated apis to retain compatibility. 

 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 
 ---

 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Attachments: TEZ-2408.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-776 PreCommit Build #618

2015-05-04 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-776
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/618/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2812 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730303/TEZ-776.11.patch
  against master revision 210619a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/618//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/618//artifact/patchprocess/newPatchFindbugsWarningstez-api.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/618//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/618//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
403d14acfeed1e196ad7b5958877262739f5fb0a logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #614
Archived 44 artifacts
Archive block size is 32768
Received 22 blocks and 2056741 bytes
Compression is 26.0%
Took 0.48 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-04 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527766#comment-14527766
 ] 

TezQA commented on TEZ-776:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730303/TEZ-776.11.patch
  against master revision 210619a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/618//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/618//artifact/patchprocess/newPatchFindbugsWarningstez-api.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/618//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/618//console

This message is automatically generated.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, 
 TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, 
 TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
 TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
 TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
 With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
 Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
 with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-05-04 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2366:
--
Attachment: TEZ-2366.1.patch

 Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
 

 Key: TEZ-2366
 URL: https://issues.apache.org/jira/browse/TEZ-2366
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Prakash Ramachandran
Priority: Critical
 Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch


 There are around 20 unit tests (out of around 2000) fail intermittently after 
 TEZ-2333. Here is a stack:
 {code}
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
 output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
 of the configured local directories
 at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 To reproduce that in Pig test, using the following commands:
 svn co http://svn.apache.org/repos/asf/pig/trunk
 ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
 test
 Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
 true 
 (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does 
 not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-04 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: TEZ-776.12.patch

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.12.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, 
 TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, 
 TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, 
 TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
 TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
 TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527845#comment-14527845
 ] 

Bikas Saha commented on TEZ-776:


prepareForRouting is guarded by synchronized in Edge which creates a read write 
barrier.
Agree about duplication, but each case has minor differences in which indices 
to use or which events to create and hence hard to merge. Once we move away 
from event creation in the AM, there will be more scope to reduce duplication.
Trying to keep the new abstract class for ODR complete in itself with an 
eventual goal of not deriving from the legacy class.
The array list size read is thread safe. There is only 1 writer which prevents 
concurrent modification. The size in an array/linked list is an int that is 
atomically modified. There have been no issues in numerous stress simulations 
and large jobs.

Broadcast edge manager cannot continue to use legacy routing since every 
consumer task needs events from every producer task leading to memory reference 
overhead proportional to MxN, which is large for large jobs.

I wish I could share your optimism on TEZ-2409 being 10 lines of code but I am 
afraid I have tried to do it and found it to be a little more involved than 
that. Besides 10 lines of code would need many more lines of new tests. This 
does not have to be a blocker for 0.7.0 since its an internal framework change 
and can be done in 0.7.1

Uploaded new patch with fixes.

[~hitesh] [~rajesh.balamohan] There have been fixes for your review comments 
made in subsequent patches. Do you want to look at them?

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.12.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, 
 TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, 
 TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, 
 TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
 TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
 TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527846#comment-14527846
 ] 

Bikas Saha edited comment on TEZ-2221 at 5/5/15 3:33 AM:
-

By definition oA and oB cannot be part of 2 different groups because they are 
added to vertexGroups in the API using VertexGroup#addDataSink. So its 
impossible for the same output/edge to be part of 2 vertex groups.


was (Author: bikassaha):
By definition oA and oB cannot be part of 2 different groups because they are 
added to vertexGroups in the API. So its impossible for the same output/edge to 
be part of 2 vertex groups.

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527846#comment-14527846
 ] 

Bikas Saha commented on TEZ-2221:
-

By definition oA and oB cannot be part of 2 different groups because they are 
added to vertexGroups in the API. So its impossible for the same output/edge to 
be part of 2 vertex groups.

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527850#comment-14527850
 ] 

Bikas Saha commented on TEZ-2221:
-

Unless there is a technical reason to not support v1,v2 in multiple vertex 
groups simultaneously, we should support it. If this jira has committed 
something to the contrary then we could revert the changes and redo them before 
a release. VertexGroups might be our cheaper answer to multiple edges between 
the same vertices. So lets not curtail any functionality that exists today by 
design or accident :)

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2392) Have all readers throw an Exception on incorrect next() usage

2015-05-04 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2392:
--
Attachment: TEZ-2392.3.patch

Thanks @sseth, [~hitesh].  
- Yes, the condition in valuesIterator is unavoidable. 
- Added comment in MRInput.getReader()
- Missed out minor test case TestUnorderedKVReader.java in earlier patch.  
Added it in latest patch.

Will commit it once pre-commit passes.

 Have all readers throw an Exception on incorrect next() usage
 -

 Key: TEZ-2392
 URL: https://issues.apache.org/jira/browse/TEZ-2392
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan
Priority: Critical
 Attachments: TEZ-2392.1.patch, TEZ-2392.2.patch, TEZ-2392.3.patch


 Follow up from TEZ-2348.
 Marking as critical since this is a behaviour change, and we should get it in 
 early.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527872#comment-14527872
 ] 

Bikas Saha commented on TEZ-2221:
-

If VertexGroup(A, v1) and VertexGroup(B, v1) and connecting both to v2 
allows for multiple edges between v1 and v2 then we should allow 2. Thats the 
simplest solution to the multiple edges issues. But this needs to be verified.

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2410) VertexGroupCommitFinishedEvent is not logged correctly

2015-05-04 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2410:

Priority: Blocker  (was: Major)

 VertexGroupCommitFinishedEvent is not logged correctly
 --

 Key: TEZ-2410
 URL: https://issues.apache.org/jira/browse/TEZ-2410
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Fix For: 0.7.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527881#comment-14527881
 ] 

Hitesh Shah commented on TEZ-2366:
--

[~pramachandran] Can you confirm that this path is not invoked in local mode? 

 Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
 

 Key: TEZ-2366
 URL: https://issues.apache.org/jira/browse/TEZ-2366
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Prakash Ramachandran
Priority: Critical
 Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch


 There are around 20 unit tests (out of around 2000) fail intermittently after 
 TEZ-2333. Here is a stack:
 {code}
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
 output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
 of the configured local directories
 at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 To reproduce that in Pig test, using the following commands:
 svn co http://svn.apache.org/repos/asf/pig/trunk
 ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
 test
 Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
 true 
 (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does 
 not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527881#comment-14527881
 ] 

Hitesh Shah edited comment on TEZ-2366 at 5/5/15 4:26 AM:
--

[~pramachandran] Can you confirm that this path is not invoked in local mode? 
The shuffle meta data will not be present in local mode. 

In any case, maybe to be safe, it might be better to write more defensive code 
for retrieving the shuffle port and if shuffle port is not available, then 
disable local fetch. 


was (Author: hitesh):
[~pramachandran] Can you confirm that this path is not invoked in local mode? 

 Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
 

 Key: TEZ-2366
 URL: https://issues.apache.org/jira/browse/TEZ-2366
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Prakash Ramachandran
Priority: Critical
 Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch


 There are around 20 unit tests (out of around 2000) fail intermittently after 
 TEZ-2333. Here is a stack:
 {code}
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
 output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any 
 of the configured local directories
 at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
 at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190)
 at 
 org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 To reproduce that in Pig test, using the following commands:
 svn co http://svn.apache.org/repos/asf/pig/trunk
 ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism 
 test
 Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to 
 true 
 (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup).
  I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does 
 not help. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2411) Offload DataMovement event creation from the AM to the tasks

2015-05-04 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2411:

Description: Today the AM creates a new DataMovement event from the 
original event sent by the producer task and supplements the new event with 
source/target indices for the consumer task. This new event creation can be 
offloaded to the task runtime and thus save CPU cycles on the AM for the object 
creation. Secondly, the original event can be kept in serialized form inside 
the AM and sent as is to the task over the RPC, thus potentially saving serde 
CPU for these events in addition to the object creation CPU. This can help when 
there is a high concurrency of running tasks in a job. Say 1 tasks running 
in parallel and sending events to the AM.  (was: Today the AM creates a new 
DataMovement event from the original event sent by the producer task and 
supplements the new event with source/target indices for the consumer task. 
This new event creation can be offloaded to the task runtime and thus save CPU 
cycles on the AM for the object creation. Secondly, the original event can be 
kept in serialized form inside the AM and sent as is to the task over the RPC, 
thus potentially saving serde CPU for these events in addition to the object 
creation CPU.)

 Offload DataMovement event creation from the AM to the tasks
 

 Key: TEZ-2411
 URL: https://issues.apache.org/jira/browse/TEZ-2411
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha

 Today the AM creates a new DataMovement event from the original event sent by 
 the producer task and supplements the new event with source/target indices 
 for the consumer task. This new event creation can be offloaded to the task 
 runtime and thus save CPU cycles on the AM for the object creation. Secondly, 
 the original event can be kept in serialized form inside the AM and sent as 
 is to the task over the RPC, thus potentially saving serde CPU for these 
 events in addition to the object creation CPU. This can help when there is a 
 high concurrency of running tasks in a job. Say 1 tasks running in 
 parallel and sending events to the AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2369) Add a few unit tests for RootInputInitializerManager

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527884#comment-14527884
 ] 

Hitesh Shah commented on TEZ-2369:
--

The patch does not include the Integer successfulAttempt = 
vertexSuccessfulAttemptMap.get(taskId.getId()); change. Maybe that should be 
added back as it seems a safe enough change that can be backported to older 
branches.

+1 for the unit test change.  

 Add a few unit tests for RootInputInitializerManager
 

 Key: TEZ-2369
 URL: https://issues.apache.org/jira/browse/TEZ-2369
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2369.1.txt, TEZ-2369.2.txt


 {code}
 -  Integer successfulAttempt = vertexSuccessfulAttemptMap.get(taskId);
 +  Integer successfulAttempt = 
 vertexSuccessfulAttemptMap.get(taskId.getId());
 {code}
 This could cause events to be sent multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2392 PreCommit Build #620

2015-05-04 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2392
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/620/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2801 lines...]
[INFO] Final Memory: 71M/958M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730353/TEZ-2392.3.patch
  against master revision 210619a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/620//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/620//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
859aba3b0f982f71e2bd7f5ab9fdaaa7af6f2484 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #614
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2630228 bytes
Compression is 4.7%
Took 0.62 sec
Description set: TEZ-2392
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2392) Have all readers throw an Exception on incorrect next() usage

2015-05-04 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527935#comment-14527935
 ] 

TezQA commented on TEZ-2392:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730353/TEZ-2392.3.patch
  against master revision 210619a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/620//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/620//console

This message is automatically generated.

 Have all readers throw an Exception on incorrect next() usage
 -

 Key: TEZ-2392
 URL: https://issues.apache.org/jira/browse/TEZ-2392
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan
Priority: Critical
 Attachments: TEZ-2392.1.patch, TEZ-2392.2.patch, TEZ-2392.3.patch


 Follow up from TEZ-2348.
 Marking as critical since this is a behaviour change, and we should get it in 
 early.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527932#comment-14527932
 ] 

Hitesh Shah commented on TEZ-2076:
--

bq. the zip doesn't need versioning, because it is an ATS dump of all known Tez 
keys.

[~gopalv] Thanks for the clarification. Missed the bit about the zip entry. And 
agreed if the ats entity json is being written as is, it would effectively be 
versioned based on teh version of the ATS api ( and the data within it 
versioned by the generation code itself )


 Tez framework to extract/analyze data stored in ATS for specific dag
 

 Key: TEZ-2076
 URL: https://issues.apache.org/jira/browse/TEZ-2076
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.2.patch, 
 TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch, 
 TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, TEZ-2076.WIP.2.patch, 
 TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch


 - Users should be able to download ATS data pertaining to a DAG from Tez-UI 
 (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
 - This can be plugged to an analyzer which parses the data, adds semantics 
 and provides an in-memory representation for further analysis.
 - This will enable to write different analyzer rules, which can be run on top 
 of this in-memory representation to come up with analysis on the DAG.
 - Results of this analyzer rules can be rendered on to UI (standalone webapp) 
 later point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527937#comment-14527937
 ] 

Jeff Zhang commented on TEZ-2221:
-

{code}
If VertexGroup(A, v1) and VertexGroup(B, v1) and connecting both to v2 
allows for multiple edges between v1 and v2 then we should allow 2.
{code}

This looks more like hack or workaround for multiple edges. If we need to 
support multiple edges, may need to create more elegant API. 

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-04 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527957#comment-14527957
 ] 

TezQA commented on TEZ-776:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730357/TEZ-776.12.patch
  against master revision 210619a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/621//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/621//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/621//console

This message is automatically generated.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.12.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, 
 TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, 
 TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, 
 TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
 TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
 TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-776 PreCommit Build #621

2015-05-04 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-776
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/621/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2808 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12730357/TEZ-776.12.patch
  against master revision 210619a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/621//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/621//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/621//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
982fcba0ffad0431e426d9b5ef984d841278279b logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #620
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2636731 bytes
Compression is 4.7%
Took 1.5 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2393) Tez pickup PATH env from gateway machine

2015-05-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527296#comment-14527296
 ] 

Jason Lowe commented on TEZ-2393:
-

I think the main problems will be from anyone who expected the old behavior.  
For example, -Dsome.mapred.or.tez.property='$bar' today expands to what the 
client has for bar rather than what the container does.  Today if one wants 
to explicitly have the variable expanded by the container launch process then 
they can use this syntax instead:
{noformat}
-Dsome.mapred.or.tez.property='{{bar}}'
{noformat}

I agree the existing behavior seems like a bug, but I don't know how many users 
are relying on the current behavior.  Note that 
org.apache.hadoop.yarn.util.Apps.setEnvFromInputString in YARN has the same 
issues, and that's the one currently used by MapReduce.

 Tez pickup PATH env from gateway machine
 

 Key: TEZ-2393
 URL: https://issues.apache.org/jira/browse/TEZ-2393
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Hitesh Shah
 Attachments: TEZ-2393.1.patch


 I found this issue on Windows. When I do:
 set PATH=C:\dummy;%PATH%
 Then run a tez job. C:\dummy appears in PATH of the vertex container. This 
 is surprising since we don't expect frontend PATH will propagate to backend.
 [~hitesh] tried it on Linux and found the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527324#comment-14527324
 ] 

Bikas Saha commented on TEZ-2408:
-

lgtm. I remember fixing these (perhaps was TestTaskImpl)

 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 
 ---

 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Attachments: TEZ-2408.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527425#comment-14527425
 ] 

Rohini Palaniswamy commented on TEZ-2221:
-

bq. what happens if someone does the following. This should also be disallowed. 
Correct?
{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_2, v1,v2);
{code}

   [~daijy] pointed out this breaks a lot of Pig scripts on Tez with 
UnionOptimizer as we have multiple outputs from each vertex and we  create a 
vertex group for each of those output now.  For eg: union followed by order by. 
There will be one sample output and one partitioner output from the union 
vertex going to two different downstream vertices. With the UnionOptimizer, the 
union is removed and two vertex groups are created.  If this is disallowed we 
will have to reuse the same Vertex group to route multiple outputs. 
GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, 
EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that. 
 Will doing that work and that is how you want us to construct the plan?

Consider another case of union followed by replicate join with two tables 
followed by order by.  The plan will consist of 8 vertices - V1 (Load) + V2 
(Load) + V3 (union) + V4a (Replicate join T1 load) + V4b (Replicate join T2 
load) + V5 (partitioner) + V6 (sampler) + V7 (order by) with V1,V2-V3, 
V4a-V3, V4b-V3, V4-V5, V4-V6, V6-V5, V5-V7.  Optimized plan will become 
V4a - (V1,V2 vertex group) , V4b - (V1,V2 vertex group) ,   (V1,V2 vertex 
group) - V5, (V1,V2 vertex group) - V6, V6-V5, V5-V7. So using one vertex 
group for routing multiple outputs and multiple inputs is how we are expected 
to construct the plan? 



 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527425#comment-14527425
 ] 

Rohini Palaniswamy edited comment on TEZ-2221 at 5/4/15 10:12 PM:
--

bq. what happens if someone does the following. This should also be disallowed. 
Correct?
{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_2, v1,v2);
{code}

   [~daijy] pointed out this breaks a lot of Pig scripts on Tez with 
UnionOptimizer as we have multiple outputs from each vertex and we  create a 
vertex group for each of those output now.  For eg: union followed by order by. 
There will be one sample output and one partitioner output from the union 
vertex going to two different downstream vertices. With the UnionOptimizer, the 
union is removed and two vertex groups are created.  If this is disallowed we 
will have to reuse the same Vertex group to route multiple outputs. 
GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, 
EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that. 
 Will doing that work and that is how you want us to construct the plan?

Consider another case of union followed by replicate join with two tables 
followed by order by.  The plan will consist of 8 vertices - V1 (Load) + V2 
(Load) + V3 (union) + V4 (Replicate join T1 load) + V5 (Replicate join T2 load) 
+ V6 (partitioner) + V7 (sampler) + V8 (order by) with V1,V2-V3, V4-V3, 
V5-V3, V3-V6, V3-V7, V7-V6, V6-V8.  Optimized plan will become V4-(V1,V2 
vertex group) , V5-(V1,V2 vertex group) , (V1,V2 vertex group) -  V6, (V1,V2 
vertex group) -  V7, V7-V6, V6-V8. So using one vertex group for routing 
multiple outputs and multiple inputs is how we are expected to construct the 
plan? 




was (Author: rohini):
bq. what happens if someone does the following. This should also be disallowed. 
Correct?
{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_2, v1,v2);
{code}

   [~daijy] pointed out this breaks a lot of Pig scripts on Tez with 
UnionOptimizer as we have multiple outputs from each vertex and we  create a 
vertex group for each of those output now.  For eg: union followed by order by. 
There will be one sample output and one partitioner output from the union 
vertex going to two different downstream vertices. With the UnionOptimizer, the 
union is removed and two vertex groups are created.  If this is disallowed we 
will have to reuse the same Vertex group to route multiple outputs. 
GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, 
EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that. 
 Will doing that work and that is how you want us to construct the plan?

Consider another case of union followed by replicate join with two tables 
followed by order by.  The plan will consist of 8 vertices - V1 (Load) + V2 
(Load) + V3 (union) + V4a (Replicate join T1 load) + V4b (Replicate join T2 
load) + V5 (partitioner) + V6 (sampler) + V7 (order by) with V1,V2-V3, 
V4a-V3, V4b-V3, V4-V5, V4-V6, V6-V5, V5-V7.  Optimized plan will become 
V4a - (V1,V2 vertex group) , V4b - (V1,V2 vertex group) ,   (V1,V2 vertex 
group) - V5, (V1,V2 vertex group) - V6, V6-V5, V5-V7. So using one vertex 
group for routing multiple outputs and multiple inputs is how we are expected 
to construct the plan? 



 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-05-04 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: TEZ-776.11.patch

Uploading new patch that creates a new abstract class for on-demand routing 
APIs, leaving the legacy plugin API unchanged. Opened TEZ-2409 to make changes 
for supporting different plugins on the same vertex. Hopefully this addresses 
any remaining concerns.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
 TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, 
 TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, 
 TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, 
 TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, 
 TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, 
 With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, 
 Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, 
 with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527448#comment-14527448
 ] 

Hitesh Shah commented on TEZ-2221:
--

[~rohini] Yes - I believe both are being dis-allowed i.e. 

{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_2, v1,v2);
{code}

and

{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_1, v2,v3);
{code}

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527463#comment-14527463
 ] 

Hitesh Shah commented on TEZ-2221:
--

\cc [~bikassaha] in case he has any input. 

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)

2015-05-04 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527511#comment-14527511
 ] 

Rajesh Balamohan commented on TEZ-2237:
---

lgtm. +1.  

Might need to fix log statement Setting all {} partitions as empty for 
non-started output: in TEZ-2237.2.branch6.txt before committing. 

 Complex DAG freezes and fails (was BufferTooSmallException raised in 
 UnorderedPartitionedKVWriter then DAG lingers)
 ---

 Key: TEZ-2237
 URL: https://issues.apache.org/jira/browse/TEZ-2237
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: Debian Linux jessie
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b27)
 OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system 
 disk + 4*1 or 2 TiB HDD for HDFS  local  (on-prem, dedicated hardware)
 Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 
 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0
Reporter: Cyrille Chépélov
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, 
 TEZ-2237.1.master.txt, TEZ-2237.2.branch6.txt, TEZ-2237.2.master.txt, 
 TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, 
 alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, 
 application_142732418_1908.red.txt.bz2, 
 application_1427964335235_2070.txt.red.txt.bz2, 
 appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, 
 appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, 
 gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, 
 oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, 
 output-starts.txt, start_containers.png, stop_containers.png, 
 syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, 
 syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png


 On a specific DAG with many vertices (actually part of a larger meta-DAG), 
 after about a hour of processing, several BufferTooSmallException are raised 
 in UnorderedPartitionedKVWriter (about one every two or three spills).
 Once these exceptions are raised, the DAG remains indefinitely active, 
 tying up memory and CPU resources as far as YARN is concerned, while little 
 if any actual processing takes place. 
 It seems two separate issues are at hand:
   1. BufferTooSmallException are raised even though, small as the actually 
 allocated buffers seem to be (around a couple megabytes were allotted whereas 
 100MiB were requested), the actual keys and values are never bigger than 24 
 and 1024 bytes respectively.
   2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop 
 (stop requests appear to be sent 7 hours after the BTSE exceptions are 
 raised, but 9 hours after these stop requests, the DAG was still lingering on 
 with all containers present tying up memory and CPU allocations)
 The emergence of the BTSE prevent the Cascade to complete, preventing from 
 validating the results compared to traditional MR1-based results. The lack of 
 conclusion renders the cluster queue unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2392) Have all readers throw an Exception on incorrect next() usage

2015-05-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527538#comment-14527538
 ] 

Siddharth Seth commented on TEZ-2392:
-

Patch looks good to me - ValuesIterator has the check in the main path, but I'm 
not sure that can be avoided. +1
In case of the MRReaders, the check being after recordReader.next leaves this 
open to an exception from user code. Should just document this (in 
MRInput.getReader()) - An exception will be thrown if next() is invoked after 
false, either from the framework or from the underlying InputFormat.

 Have all readers throw an Exception on incorrect next() usage
 -

 Key: TEZ-2392
 URL: https://issues.apache.org/jira/browse/TEZ-2392
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan
Priority: Critical
 Attachments: TEZ-2392.1.patch, TEZ-2392.2.patch


 Follow up from TEZ-2348.
 Marking as critical since this is a behaviour change, and we should get it in 
 early.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527549#comment-14527549
 ] 

Bikas Saha commented on TEZ-2221:
-

Disallowing this should be ok and sounds related to the jira since the output 
committer is identified by the vertex group name.
{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_1, v2,v3);
{code}

Would like to understand why this is being disallowed? From what I see this 
would work for the async commit logic, since each async commit per output per 
vertex in the group. So separating by group name should be ok.
{code}
dag.createVertexGroup(group_1, v1,v2);
dag.createVertexGroup(group_2, v1,v2);
{code}
Is there any use case that can be supported here but not by combining them?


 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2237) Valid events should be sent out when an Output is not started

2015-05-04 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2237:

Summary: Valid events should be sent out when an Output is not started  
(was: Complex DAG freezes and fails (was BufferTooSmallException raised in 
UnorderedPartitionedKVWriter then DAG lingers))

 Valid events should be sent out when an Output is not started
 -

 Key: TEZ-2237
 URL: https://issues.apache.org/jira/browse/TEZ-2237
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: Debian Linux jessie
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b27)
 OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system 
 disk + 4*1 or 2 TiB HDD for HDFS  local  (on-prem, dedicated hardware)
 Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 
 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0
Reporter: Cyrille Chépélov
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, 
 TEZ-2237.1.master.txt, TEZ-2237.2.branch6.txt, TEZ-2237.2.master.txt, 
 TEZ-2237.3.branch6.txt, TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, 
 alloc_mem.png, alloc_vcores.png, 
 application_142732418_1444.yarn-logs.red.txt.gz, 
 application_142732418_1908.red.txt.bz2, 
 application_1427964335235_2070.txt.red.txt.bz2, 
 appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, 
 appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, 
 gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, 
 oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, 
 output-starts.txt, start_containers.png, stop_containers.png, 
 syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, 
 syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png


 On a specific DAG with many vertices (actually part of a larger meta-DAG), 
 after about a hour of processing, several BufferTooSmallException are raised 
 in UnorderedPartitionedKVWriter (about one every two or three spills).
 Once these exceptions are raised, the DAG remains indefinitely active, 
 tying up memory and CPU resources as far as YARN is concerned, while little 
 if any actual processing takes place. 
 It seems two separate issues are at hand:
   1. BufferTooSmallException are raised even though, small as the actually 
 allocated buffers seem to be (around a couple megabytes were allotted whereas 
 100MiB were requested), the actual keys and values are never bigger than 24 
 and 1024 bytes respectively.
   2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop 
 (stop requests appear to be sent 7 hours after the BTSE exceptions are 
 raised, but 9 hours after these stop requests, the DAG was still lingering on 
 with all containers present tying up memory and CPU allocations)
 The emergence of the BTSE prevent the Cascade to complete, preventing from 
 validating the results compared to traditional MR1-based results. The lack of 
 conclusion renders the cluster queue unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2408:
-
Affects Version/s: (was: 0.7.0)

 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 
 ---

 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Attachments: TEZ-2408.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2408:
-
Affects Version/s: 0.7.0

 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 
 ---

 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Attachments: TEZ-2408.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2408:
-
Target Version/s: 0.7.0

 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 
 ---

 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Attachments: TEZ-2408.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-05-04 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527479#comment-14527479
 ] 

Rohini Palaniswamy commented on TEZ-2221:
-

bq. dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_2, 
v1,v2);
 It should be a simple change for us to reuse the vertex group. But since we 
have never used it that way want to ensure that Tez will be fine if we 
constructed plans like that.

bq. dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_1, 
v2,v3);
We are not reusing group names anywhere. So that is not an issue for us.

 VertexGroup name should be unqiue
 -

 Key: TEZ-2221
 URL: https://issues.apache.org/jira/browse/TEZ-2221
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0, 0.5.4, 0.6.1

 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
 TEZ-2221-4.patch


 VertexGroupCommitStartedEvent  VertexGroupCommitFinishedEvent use vertex 
 group name to identify the vertex group commit, the same name of vertex group 
 will conflict. While in the current equals  hashCode of VertexGroup, vertex 
 group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)

2015-05-04 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2237:

Attachment: TEZ-2237.3.branch6.txt

Patch with the log line fixed for branch-0.6.

Thanks for the review [~rajesh.balamohan], reporting and helping try out the 
fix [~cchepelov]. Committing this.

 Complex DAG freezes and fails (was BufferTooSmallException raised in 
 UnorderedPartitionedKVWriter then DAG lingers)
 ---

 Key: TEZ-2237
 URL: https://issues.apache.org/jira/browse/TEZ-2237
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: Debian Linux jessie
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b27)
 OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system 
 disk + 4*1 or 2 TiB HDD for HDFS  local  (on-prem, dedicated hardware)
 Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 
 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0
Reporter: Cyrille Chépélov
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, 
 TEZ-2237.1.master.txt, TEZ-2237.2.branch6.txt, TEZ-2237.2.master.txt, 
 TEZ-2237.3.branch6.txt, TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, 
 alloc_mem.png, alloc_vcores.png, 
 application_142732418_1444.yarn-logs.red.txt.gz, 
 application_142732418_1908.red.txt.bz2, 
 application_1427964335235_2070.txt.red.txt.bz2, 
 appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, 
 appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, 
 gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, 
 oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, 
 output-starts.txt, start_containers.png, stop_containers.png, 
 syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, 
 syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png


 On a specific DAG with many vertices (actually part of a larger meta-DAG), 
 after about a hour of processing, several BufferTooSmallException are raised 
 in UnorderedPartitionedKVWriter (about one every two or three spills).
 Once these exceptions are raised, the DAG remains indefinitely active, 
 tying up memory and CPU resources as far as YARN is concerned, while little 
 if any actual processing takes place. 
 It seems two separate issues are at hand:
   1. BufferTooSmallException are raised even though, small as the actually 
 allocated buffers seem to be (around a couple megabytes were allotted whereas 
 100MiB were requested), the actual keys and values are never bigger than 24 
 and 1024 bytes respectively.
   2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop 
 (stop requests appear to be sent 7 hours after the BTSE exceptions are 
 raised, but 9 hours after these stop requests, the DAG was still lingering on 
 with all containers present tying up memory and CPU allocations)
 The emergence of the BTSE prevent the Cascade to complete, preventing from 
 validating the results compared to traditional MR1-based results. The lack of 
 conclusion renders the cluster queue unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2198) Fix sorter spill counts

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527423#comment-14527423
 ] 

Hitesh Shah commented on TEZ-2198:
--

\cc [~gopalv] [~sseth] for review

 Fix sorter spill counts
 ---

 Key: TEZ-2198
 URL: https://issues.apache.org/jira/browse/TEZ-2198
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2198.1.patch, TEZ-2198.2.patch, TEZ-2198.3.patch, 
 TEZ-2198.4.patch, no_additional_spills_eg_pipelined_shuffle.png, 
 with_additional_spills.png


 Prior to pipelined shuffle, tez merged all spilled data into a single file.  
 This ended up creating one index file and one output file. In this context, 
 TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional 
 spills and there was no counter needed to track the number of merges.
 With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT 
 would be misleading, as these spills are direct output files which are 
 consumed by the consumers.
 It would be good to have the following 
 - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task 
 to generate the final merged output
 - TOTAL_SPILLS: represents the total number of shuffle directories (index + 
 output files) that got created at the end of processing.
 For e.g, Assume sorter generated 5 spills in an attempt
 Without pipelining:
 ==
 ADDITIONAL_SPILL_COUNT = 5 -- Additional spills involved in sorting
 TOTAL_SPILLS = 1 -- Final merged output
 With pipelining:
 
 ADDITIONAL_SPILL_COUNT = 0 -- Additional spills involved in sorting
 TOTAL_SPILLS = 5 --- all spills are final output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2379:
-
Attachment: TEZ-2379.branch-0.5.patch

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch, 
 TEZ-2379.branch-0.5.patch


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2408:
-
Attachment: TEZ-2408.1.patch

[~bikassaha] [~sseth] review please. 

 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 
 ---

 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor
 Attachments: TEZ-2408.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter

2015-05-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527144#comment-14527144
 ] 

Gopal V edited comment on TEZ-2407 at 5/4/15 7:49 PM:
--

No, the issue is that {{DataInputBuffer::getLength()}} has bad semantics - it 
returns capacity instead of length of data.

We are always forced to do {{DataInputBuffer::getLength() - 
DataInputBuffer::getPosition()}} to get the accurate value  that's an easy 
thing to forget.

Since {{DataInputBuffer}} comes from hadoop, we can't change the original - 
however, we can make our code more readable as it is a simple class to replace 
 make getLength() meaningful.


was (Author: gopalv):
No, the issue is that {{DataInputBuffer::getLength()}} has bad semantics - it 
returns capacity instead of length of data.

Since that comes from hadoop, we can't change the original - however, we can 
make our code more readable as it is a simple class to replace  make 
getLength() meaningful.

 Drop references to the old DataInputBuffer impl in PipelinedSorter
 --

 Key: TEZ-2407
 URL: https://issues.apache.org/jira/browse/TEZ-2407
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-2408:


 Summary: TestTaskAttempt fails to compile against hadoop-2.4 and 
hadoop-2.2 
 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2

2015-05-04 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2408:
-
Priority: Minor  (was: Major)

 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 
 ---

 Key: TEZ-2408
 URL: https://issues.apache.org/jira/browse/TEZ-2408
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1529) ATS and TezClient integration in secure kerberos enabled cluster

2015-05-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527361#comment-14527361
 ] 

Zhijie Shen commented on TEZ-1529:
--

The patch looks good to me overall. Two nits:

1. In getJsonRootEntity, you may need to handle UndeclaredThrowableException 
too.

2. I think we can reuse the http client. It's not necessary one client per 
request.
{code}
540   httpClient = new Client(new URLConnectionClientHandler(new 
TimelineUrlConnectionFactory()),
541   config);
{code}

 ATS and TezClient integration  in secure kerberos enabled cluster
 -

 Key: TEZ-1529
 URL: https://issues.apache.org/jira/browse/TEZ-1529
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
Priority: Blocker
 Attachments: TEZ-1529.1.patch


 This is a follow up for TEZ-1495 which address ATS - TezClient integration. 
 however it does not enable it  in secure kerberos enabled cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2409) Allow different edges to have different routing plugins

2015-05-04 Thread Bikas Saha (JIRA)
Bikas Saha created TEZ-2409:
---

 Summary: Allow different edges to have different routing plugins
 Key: TEZ-2409
 URL: https://issues.apache.org/jira/browse/TEZ-2409
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha


It may be useful to allow different edge manager plugin types based on 
different requirements. In order to support this, we would need to support 
different plugins per edge for routing the events on that edge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter

2015-05-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527135#comment-14527135
 ] 

Hitesh Shah commented on TEZ-2407:
--

If refactor then 0.8 makes sense. Was not sure if this was related to any 
memory related cleanup based on the jira title. 

 Drop references to the old DataInputBuffer impl in PipelinedSorter
 --

 Key: TEZ-2407
 URL: https://issues.apache.org/jira/browse/TEZ-2407
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter

2015-05-04 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527144#comment-14527144
 ] 

Gopal V commented on TEZ-2407:
--

No, the issue is that {{DataInputBuffer::getLength()}} has bad semantics - it 
returns capacity instead of length of data.

Since that comes from hadoop, we can't change the original - however, we can 
make our code more readable as it is a simple class to replace  make 
getLength() meaningful.

 Drop references to the old DataInputBuffer impl in PipelinedSorter
 --

 Key: TEZ-2407
 URL: https://issues.apache.org/jira/browse/TEZ-2407
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)