date:20150323

[jira] [Commented] (TEZ-2219) Should verify the input_name/output_name to be unique per vertex

2015-03-23 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375485#comment-14375485
 ] 

Jeff Zhang commented on TEZ-2219:
-

Thanks [~hitesh] Committed to master, branch-0.5, branch-0.6

> Should verify the input_name/output_name to be unique per vertex
> 
>
> Key: TEZ-2219
> URL: https://issues.apache.org/jira/browse/TEZ-2219
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.5.4
>
> Attachments: TEZ-2219-1.txt, TEZ-2219-2.patch, TEZ-2219-3.patch
>
>
> RuntimeTask try to get the Input/Output using the input_name/output_name, so 
> input_name/output_name should be unique per vertex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-714:
---
Attachment: TEZ-714-2.patch

> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375512#comment-14375512
 ] 

Jeff Zhang commented on TEZ-714:


Upload a new patch. [~bikassaha] Please help review it.

* Wrap the commit in the CallableEvent both in DAG & Vertex, but for the abort, 
still call it inline. Make the abort asyn will complicate the patch, so still 
keep it a sync call as before.
* Introduce new state COMMITTING for Vertex & DAG
** Vertex's COMMITTING means vertex is in the middle of committing, if vertex 
has no committers or the option of TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is 
true, vertex would not to to COMMITTING state.
** DAG's COMMITTING has 2 cases, one is when 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true and all the vertices are 
completed, another case is that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is 
false and all the vertices are completed, but still some vertex group 
committers are running.
* Regarding the issue of "not sure why group-commit and non-group commit need 
to be differentiated in different transitions.", I rename it to 
NonFinalCommitCompletedTransition and FinalCommitCompletetionTransition (maybe 
there's better names ). One mean the committer when 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and the other means 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true. The reason I differentiate 
them is that for the NonFinalCommitCompletedEvent, we need to log the recovery 
log of VertexGroupCommitCompletedEvent while it is not necessary for 
FinalCommitCompletedEvent.
* Unit test is still not perfect. Because currently in the DAGImpl/VertexImpl 
we run the shared thread pool in the AsynDispatcher thread ( that means 
Committer still run in the thread of AsynDispather) so this may hide some 
potential issues and under this thread mode, it is not possible for test some 
cases like kill dag while it is in committing. I am trying to think of ways to 
simulate the shared thread pool in the unit test.
* For the some existing transition, like (RUNNING to ERROR due to INTERNAL 
ERROR), I am not sure why it go to ERROR directly rather than TERMINATING. 
Maybe it is to allow the client get the final status as earyl as possible.


> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375512#comment-14375512
 ] 

Jeff Zhang edited comment on TEZ-714 at 3/23/15 8:04 AM:
-

Upload a new patch. [~bikassaha] Please help review it.

* Wrap the commit in the CallableEvent both in DAG & Vertex, but for the abort, 
still call it inline. Make the abort asyn will complicate the patch, so still 
keep it a sync call as before.
* Introduce new state COMMITTING for Vertex & DAG
** Vertex's COMMITTING means vertex is in the middle of committing, if vertex 
has no committers or the option of TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is 
true, vertex would not to to COMMITTING state.
** DAG's COMMITTING has 2 cases, one is when 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true and all the vertices are 
completed, another case is that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is 
false and all the vertices are completed, but still some vertex group 
committers are running.
* Regarding the issue of "not sure why group-commit and non-group commit need 
to be differentiated in different transitions.", I rename it to 
NonFinalCommitCompletedTransition and FinalCommitCompletetionTransition (maybe 
there's better names ). One mean the committer when 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and the other means 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true. The reason I differentiate 
them is that for the NonFinalCommitCompletedEvent, we need to log the recovery 
log of VertexGroupCommitCompletedEvent while it is not necessary for 
FinalCommitCompletedEvent.
* Unit test is still not perfect. Because currently in the DAGImpl/VertexImpl 
we run the shared thread pool in the AsynDispatcher thread ( that means 
Committer still run in the thread of AsynDispather) so this may hide some 
potential issues and under this thread mode, it is not possible for test some 
cases like kill dag while it is in committing. I am trying to think of ways to 
simulate the shared thread pool in the unit test.
* For the some existing transition, like (RUNNING to ERROR due to INTERNAL 
ERROR), I am not sure why it go to ERROR directly rather than TERMINATING. 
Maybe it is to allow the client get the final status as early as possible.



was (Author: zjffdu):
Upload a new patch. [~bikassaha] Please help review it.

* Wrap the commit in the CallableEvent both in DAG & Vertex, but for the abort, 
still call it inline. Make the abort asyn will complicate the patch, so still 
keep it a sync call as before.
* Introduce new state COMMITTING for Vertex & DAG
** Vertex's COMMITTING means vertex is in the middle of committing, if vertex 
has no committers or the option of TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is 
true, vertex would not to to COMMITTING state.
** DAG's COMMITTING has 2 cases, one is when 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true and all the vertices are 
completed, another case is that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is 
false and all the vertices are completed, but still some vertex group 
committers are running.
* Regarding the issue of "not sure why group-commit and non-group commit need 
to be differentiated in different transitions.", I rename it to 
NonFinalCommitCompletedTransition and FinalCommitCompletetionTransition (maybe 
there's better names ). One mean the committer when 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and the other means 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true. The reason I differentiate 
them is that for the NonFinalCommitCompletedEvent, we need to log the recovery 
log of VertexGroupCommitCompletedEvent while it is not necessary for 
FinalCommitCompletedEvent.
* Unit test is still not perfect. Because currently in the DAGImpl/VertexImpl 
we run the shared thread pool in the AsynDispatcher thread ( that means 
Committer still run in the thread of AsynDispather) so this may hide some 
potential issues and under this thread mode, it is not possible for test some 
cases like kill dag while it is in committing. I am trying to think of ways to 
simulate the shared thread pool in the unit test.
* For the some existing transition, like (RUNNING to ERROR due to INTERNAL 
ERROR), I am not sure why it go to ERROR directly rather than TERMINATING. 
Maybe it is to allow the client get the final status as earyl as possible.


> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCom

[jira] [Updated] (TEZ-2186) OOM with a simple scatter gather job with re-use

2015-03-23 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2186:
--
Attachment: TEZ-2186-branch-0.6.patch

Looks like I didn't upload the branch-0.6 patch in this earlier. 

> OOM with a simple scatter gather job with re-use
> 
>
> Key: TEZ-2186
> URL: https://issues.apache.org/jira/browse/TEZ-2186
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Rajesh Balamohan
> Fix For: 0.7.0
>
> Attachments: TEZ-2186-branch-0.6.patch, TEZ-2186.1.patch, 
> TEZ-2186.2.patch, noopexample.txt
>
>
> With a no-op scatter gather job, 20K x 2K, on a 20 node cluster with 20 2GB 
> containers per node - reducers end up failing with OOM errors. Haven't been 
> able to generate a heap dump yet. Will add details as they're found. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2196) Consider reusing UnorderedPartitionedKVWriter with single output in UnorderedKVOutput

2015-03-23 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2196:
--
Attachment: TEZ-2196.3.patch

Addressing review comments.
UnorderedKVOutput - instead of using the HashPartitioner - we should introduce 
a custom partitioner which always returns partition 0.
- Fixed. Created CustomParitioner in UnorderedKVOutput itself which would 
return 0.  And marked it as @Private.

UnorderedKVOutput - confKeys needs additional properties like the BUFFER_SIZE 
used by the partitionedWriter, and any other config keys that it uses.
- Added TEZ_RUNTIME_UNORDERED_OUTPUT_BUFFER_SIZE_MB, 
TEZ_RUNTIME_UNORDERED_OUTPUT_MAX_PER_BUFFER_SIZE_BYTES

In the test - the HashPartitioner is being setup. Is this required - since the 
Output sets this up anyway.
- Removed it from test cases.

Regarding special case in UnorderedPartitionedKVWriter,
- In case there is only one partition and when pipelining is disabled, current 
patch directly appends to IFile.  It completely skips the buffers and merge as 
well.

> Consider reusing UnorderedPartitionedKVWriter with single output in 
> UnorderedKVOutput
> -
>
> Key: TEZ-2196
> URL: https://issues.apache.org/jira/browse/TEZ-2196
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2196.1.patch, TEZ-2196.2.patch, TEZ-2196.3.patch
>
>
> Can possibly get rid of FileBasedKVWriter and reuse 
> UnorderedPartitionedKVWriter with single partition in UnorderedKVOutput.  
> This can also benefit from pipelined shuffle changes done in 
> UnorderedPartitionedKVWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: TEZ-2196 PreCommit Build #330

2015-03-23 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-2196
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/330/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2762 lines...]
[INFO] Final Memory: 79M/900M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706508/TEZ-2196.3.patch
  against master revision aa784be.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/330//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/330//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
1e29b7fd5ef921eb35a85f522e7ef6d3951966d8 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #329
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2526255 bytes
Compression is 7.2%
Took 1 sec
Description set: TEZ-2196
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2196) Consider reusing UnorderedPartitionedKVWriter with single output in UnorderedKVOutput

2015-03-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375762#comment-14375762
 ] 

Hadoop QA commented on TEZ-2196:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706508/TEZ-2196.3.patch
  against master revision aa784be.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/330//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/330//console

This message is automatically generated.

> Consider reusing UnorderedPartitionedKVWriter with single output in 
> UnorderedKVOutput
> -
>
> Key: TEZ-2196
> URL: https://issues.apache.org/jira/browse/TEZ-2196
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2196.1.patch, TEZ-2196.2.patch, TEZ-2196.3.patch
>
>
> Can possibly get rid of FileBasedKVWriter and reuse 
> UnorderedPartitionedKVWriter with single partition in UnorderedKVOutput.  
> This can also benefit from pipelined shuffle changes done in 
> UnorderedPartitionedKVWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376374#comment-14376374
 ] 

Hitesh Shah commented on TEZ-2214:
--

[~rajesh.balamohan] question on the newly added invocation to 
"startMemToDiskMerge". What happens when startMemToDiskMerge() is called while 
a merge is in progress? It seems like startMemToDiskMerge() is a no-op when 
that happens.





> FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses 
> memToDiskMerging
> --
>
> Key: TEZ-2214
> URL: https://issues.apache.org/jira/browse/TEZ-2214
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2214.1.patch
>
>
> Scenario:
> - commitMemory & usedMemory are beyond their allowed threshold.
> - InMemoryMerge kicks off and is in the process of flushing memory contents 
> to disk
> - As it progresses, it releases memory segments as well (but not yet over).
> - Fetchers who need memory < maxSingleShuffleLimit, get scheduled.
> - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. 
> Since InMemoryMerge is already in progress, this wouldn't trigger another 
> merge().
> - Pretty soon all fetchers would be stalled and get into the following state.
> {noformat}
> Thread 9351: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
> imprecise)
>  - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory()
>  @bci=17, line=337 (Interpreted frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run()
>  @bci=34, line=157 (Interpreted frame)
> {noformat}
> - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their 
> threshold and no other fetcher threads (all are in stalled state) are there 
> to release memory. This causes fetchers to wait indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2176) Move all logging to slf4j

2015-03-23 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2176:

Attachment: TEZ-2176.2.1.txt

Rebased version of TEZ-2176.2.

> Move all logging to slf4j
> -
>
> Key: TEZ-2176
> URL: https://issues.apache.org/jira/browse/TEZ-2176
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Vasanth kumar RJ
> Attachments: TEZ-2176.1.patch, TEZ-2176.2.1.txt, TEZ-2176.2.patch, 
> TEZ-2176.patch
>
>
> SLF4J supports a more comprehensive set of APIs - MDC, Formatted strings.
> Also drop commons-logging from the dependency set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2176) Move all logging to slf4j

2015-03-23 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376404#comment-14376404
 ] 

Siddharth Seth commented on TEZ-2176:
-

+1. Looks good. Attaching a rebased patch after the last few commits and 
committing, before this goes stale. Thanks [~vasanthkumar].

> Move all logging to slf4j
> -
>
> Key: TEZ-2176
> URL: https://issues.apache.org/jira/browse/TEZ-2176
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Vasanth kumar RJ
> Attachments: TEZ-2176.1.patch, TEZ-2176.2.1.txt, TEZ-2176.2.patch, 
> TEZ-2176.patch
>
>
> SLF4J supports a more comprehensive set of APIs - MDC, Formatted strings.
> Also drop commons-logging from the dependency set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2176) Move all logging to slf4j

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376493#comment-14376493
 ] 

Bikas Saha commented on TEZ-2176:
-

There should probably be a follow up jira to remove instances of 
LOG.isDebugEnabled() from the code based on 
http://www.slf4j.org/faq.html#logging_performance
[~vasanthkumar] Do you think you can take a crack at it?

> Move all logging to slf4j
> -
>
> Key: TEZ-2176
> URL: https://issues.apache.org/jira/browse/TEZ-2176
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Vasanth kumar RJ
> Fix For: 0.7.0
>
> Attachments: TEZ-2176.1.patch, TEZ-2176.2.1.txt, TEZ-2176.2.patch, 
> TEZ-2176.patch
>
>
> SLF4J supports a more comprehensive set of APIs - MDC, Formatted strings.
> Also drop commons-logging from the dependency set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2149) Optimizations for the timed version of DAGClient.getStatus

2015-03-23 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2149:

Attachment: TEZ-2149.1.txt

Patch adds a notify in the AM to return early instead of the sleep. Also 
changes the waitUntilCompletion methods to use this API instead of an explicit 
sleep.

[~bikassaha], [~hitesh], [~pramachandran] - please review.

> Optimizations for the timed version of DAGClient.getStatus
> --
>
> Key: TEZ-2149
> URL: https://issues.apache.org/jira/browse/TEZ-2149
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2149.1.txt
>
>
> From 
> https://issues.apache.org/jira/browse/TEZ-1967?focusedCommentId=14325037&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325037
> - The sleep within the AM can be improved via monitors.
> - INITED state is returned when communicating with the AM, SUBMITTED state is 
> returned when communicating with the RM. That could be used to optimize the 
> flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2222) Investigate moving to log4j2 for logging

2015-03-23 Thread Siddharth Seth (JIRA)

Siddharth Seth created TEZ-:
---

 Summary: Investigate moving to log4j2 for logging
 Key: TEZ-
 URL: https://issues.apache.org/jira/browse/TEZ-
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth


Via slf4j.

Some bits to keep in mind
- We have explicit code which rotates logs using direct log4j12 APIs. This 
should keep working. I believe the log4j2 APIs are different here
- API compatibility between log4j12 / log4j2 can be problematic - if both end 
up on the classpath (I believe the APIs are different)
- Hadoop dist includes a slf4j-log4j12 binding. Changing the default can result 
in sl4j-log4j12 and slf4j-log4j2 to co-exist by default - which could be 
problematic. Needs investigation.

End of the day, we will likely need an option to use either of the two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging

2015-03-23 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376763#comment-14376763
 ] 

Rajesh Balamohan commented on TEZ-2214:
---

[~hitesh] - In such cases, the next line "inMemoryMerger.waitForMerge()" acts 
as the barrier.  It would wait until the existing merging completes (which 
internally releases memory for usedMemory & commitMemory). 

> FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses 
> memToDiskMerging
> --
>
> Key: TEZ-2214
> URL: https://issues.apache.org/jira/browse/TEZ-2214
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2214.1.patch
>
>
> Scenario:
> - commitMemory & usedMemory are beyond their allowed threshold.
> - InMemoryMerge kicks off and is in the process of flushing memory contents 
> to disk
> - As it progresses, it releases memory segments as well (but not yet over).
> - Fetchers who need memory < maxSingleShuffleLimit, get scheduled.
> - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. 
> Since InMemoryMerge is already in progress, this wouldn't trigger another 
> merge().
> - Pretty soon all fetchers would be stalled and get into the following state.
> {noformat}
> Thread 9351: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
> imprecise)
>  - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory()
>  @bci=17, line=337 (Interpreted frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run()
>  @bci=34, line=157 (Interpreted frame)
> {noformat}
> - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their 
> threshold and no other fetcher threads (all are in stalled state) are there 
> to release memory. This causes fetchers to wait indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging

2015-03-23 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376763#comment-14376763
 ] 

Rajesh Balamohan edited comment on TEZ-2214 at 3/23/15 10:21 PM:
-

[~hitesh] - In such cases, the next line "inMemoryMerger.waitForMerge()" acts 
as the barrier.  It would wait until the existing merge completes (which 
internally releases memory for usedMemory & commitMemory). 


was (Author: rajesh.balamohan):
[~hitesh] - In such cases, the next line "inMemoryMerger.waitForMerge()" acts 
as the barrier.  It would wait until the existing merging completes (which 
internally releases memory for usedMemory & commitMemory). 

> FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses 
> memToDiskMerging
> --
>
> Key: TEZ-2214
> URL: https://issues.apache.org/jira/browse/TEZ-2214
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2214.1.patch
>
>
> Scenario:
> - commitMemory & usedMemory are beyond their allowed threshold.
> - InMemoryMerge kicks off and is in the process of flushing memory contents 
> to disk
> - As it progresses, it releases memory segments as well (but not yet over).
> - Fetchers who need memory < maxSingleShuffleLimit, get scheduled.
> - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. 
> Since InMemoryMerge is already in progress, this wouldn't trigger another 
> merge().
> - Pretty soon all fetchers would be stalled and get into the following state.
> {noformat}
> Thread 9351: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be 
> imprecise)
>  - java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory()
>  @bci=17, line=337 (Interpreted frame)
>  - 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run()
>  @bci=34, line=157 (Interpreted frame)
> {noformat}
> - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their 
> threshold and no other fetcher threads (all are in stalled state) are there 
> to release memory. This causes fetchers to wait indefinitely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-2149 PreCommit Build #331

2015-03-23 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-2149
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/331/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2762 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706725/TEZ-2149.1.txt
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 186 javac 
compiler warnings (more than the master's current 180 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/331//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/331//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/331//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
51847fbdaa19c11add5148b625cd3be38588f1c8 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #330
Archived 45 artifacts
Archive block size is 32768
Received 19 blocks and 2106290 bytes
Compression is 22.8%
Took 1.6 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2149) Optimizations for the timed version of DAGClient.getStatus

2015-03-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376778#comment-14376778
 ] 

Hadoop QA commented on TEZ-2149:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706725/TEZ-2149.1.txt
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 186 javac 
compiler warnings (more than the master's current 180 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/331//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/331//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/331//console

This message is automatically generated.

> Optimizations for the timed version of DAGClient.getStatus
> --
>
> Key: TEZ-2149
> URL: https://issues.apache.org/jira/browse/TEZ-2149
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2149.1.txt
>
>
> From 
> https://issues.apache.org/jira/browse/TEZ-1967?focusedCommentId=14325037&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325037
> - The sleep within the AM can be improved via monitors.
> - INITED state is returned when communicating with the AM, SUBMITTED state is 
> returned when communicating with the RM. That could be used to optimize the 
> flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (TEZ-1937) Reduce cost of merging ifiles in UnorderedPartitionedWriter

2015-03-23 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved TEZ-1937.
---
Resolution: Duplicate

This is already taken care as part of fixing TEZ-1094.  Marking this as a 
duplicate.  

> Reduce cost of merging ifiles in UnorderedPartitionedWriter
> ---
>
> Key: TEZ-1937
> URL: https://issues.apache.org/jira/browse/TEZ-1937
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1937.1.patch, TEZ-1937.2.patch, TEZ-1937.WIP.patch
>
>
> Currently we iterate through all spilled files for merging.  This incurs 
> additional deserialization cost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag

2015-03-23 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2076:
--
Attachment: TEZ-2076.6.patch

Fixed minor pom.xml issue.

> Tez framework to extract/analyze data stored in ATS for specific dag
> 
>
> Key: TEZ-2076
> URL: https://issues.apache.org/jira/browse/TEZ-2076
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2076.1.patch, TEZ-2076.2.patch, TEZ-2076.3.patch, 
> TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch, TEZ-2076.WIP.2.patch, 
> TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch
>
>
> - Users should be able to download ATS data pertaining to a DAG from Tez-UI 
> (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
> - This can be plugged to an analyzer which parses the data, adds semantics 
> and provides an in-memory representation for further analysis.
> - This will enable to write different analyzer rules, which can be run on top 
> of this in-memory representation to come up with analysis on the DAG.
> - Results of this analyzer rules can be rendered on to UI (standalone webapp) 
> later point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: TEZ-2076 PreCommit Build #332

2015-03-23 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-2076
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/332/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2767 lines...]
[INFO] Final Memory: 82M/846M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706758/TEZ-2076.6.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/332//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/332//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
03d06ac089aa0174e7fd748b49510ec9b96dd930 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #330
Archived 53 artifacts
Archive block size is 32768
Received 6 blocks and 7384023 bytes
Compression is 2.6%
Took 2.3 sec
Description set: TEZ-2076
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag

2015-03-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376958#comment-14376958
 ] 

Hadoop QA commented on TEZ-2076:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706758/TEZ-2076.6.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/332//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/332//console

This message is automatically generated.

> Tez framework to extract/analyze data stored in ATS for specific dag
> 
>
> Key: TEZ-2076
> URL: https://issues.apache.org/jira/browse/TEZ-2076
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-2076.1.patch, TEZ-2076.2.patch, TEZ-2076.3.patch, 
> TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch, TEZ-2076.WIP.2.patch, 
> TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch
>
>
> - Users should be able to download ATS data pertaining to a DAG from Tez-UI 
> (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
> - This can be plugged to an analyzer which parses the data, adds semantics 
> and provides an in-memory representation for further analysis.
> - This will enable to write different analyzer rules, which can be run on top 
> of this in-memory representation to come up with analysis on the DAG.
> - Results of this analyzer rules can be rendered on to UI (standalone webapp) 
> later point in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377018#comment-14377018
 ] 

Jeff Zhang commented on TEZ-2204:
-

Upload new patch (exclude the findbugs warning )

[~hitesh] [~bikassaha] Please help review it.

> TestAMRecovery increasingly flaky on jenkins builds. 
> -
>
> Key: TEZ-2204
> URL: https://issues.apache.org/jira/browse/TEZ-2204
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch
>
>
> In recent pre-commit builds and daily builds, there seem to have been some 
> occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2204:

Attachment: TEZ-2204-3.patch

> TestAMRecovery increasingly flaky on jenkins builds. 
> -
>
> Key: TEZ-2204
> URL: https://issues.apache.org/jira/browse/TEZ-2204
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch
>
>
> In recent pre-commit builds and daily builds, there seem to have been some 
> occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2204:

Attachment: TEZ-2204-4.patch

Minor update on the patch

> TestAMRecovery increasingly flaky on jenkins builds. 
> -
>
> Key: TEZ-2204
> URL: https://issues.apache.org/jira/browse/TEZ-2204
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch, 
> TEZ-2204-4.patch
>
>
> In recent pre-commit builds and daily builds, there seem to have been some 
> occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2217:

Attachment: TEZ-2217.1.patch

Attaching a fix that ensures that when there are no further pending container 
requests then new containers are not released if they have been added to the 
min held list. This should be safe because there are no pending requests. 
[~gopalv] Can you please try this out and see if this fixes your case? If so, 
then a review would be great :) The code change is minimal and explained above. 
The test was a pain to write :P

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377068#comment-14377068
 ] 

Gopal V commented on TEZ-2217:
--

I quickly cross-checked, this - it seems to be still letting go of containers 
despite min-held being > queue size.

The containers were observed as being released during the getSplits() operation.

{code}
2015-03-23 18:35:14,865 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
Generating splits
2015-03-23 18:35:14,870 INFO [InputInitializer [Map 1] #0] log.PerfLogger: 

2015-03-23 18:35:14,889 INFO [DelayedContainerManager] 
rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
container, containerId=container_1424502260528_1391_01_000310, 
containerExpiryTime=1427160914665, idleTimeoutMin=5000
2015-03-23 18:35:14,889 INFO [DelayedContainerManager] 
rm.YarnTaskSchedulerService: Releasing unused container: 
container_1424502260528_1391_01_000310
2015-03-23 18:35:14,889 INFO [Dispatcher thread: Central] 
history.HistoryEventHandler: 
[HISTORY][DAG:dag_1424502260528_1391_11][Event:CONTAINER_STOPPED]: 
containerId=container_1424502260528_1391_01_000310, stoppedTime=1427160914889, 
exitStatus=0
2015-03-23 18:35:14,889 INFO [Dispatcher thread: Central] 
container.AMContainerImpl: AMContainer container_1424502260528_1391_01_000310 
transitioned from IDLE to STOP_REQUESTED via event C_STOP_REQUEST
2015-03-23 18:35:14,890 INFO [ContainerLauncher #25] 
launcher.ContainerLauncherImpl: Processing the event EventType: 
CONTAINER_STOP_REQUEST
{code}

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377072#comment-14377072
 ] 

Gopal V commented on TEZ-2217:
--

[~bikassaha]: any suggestions on more logging in the code to narrow down this?

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-2204 PreCommit Build #333

2015-03-23 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-2204
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/333/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2753 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706785/TEZ-2204-4.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/333//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/333//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
e96c6d82358f4d860778235c1794bfe40a782908 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #332
Archived 44 artifacts
Archive block size is 32768
Received 8 blocks and 2461464 bytes
Compression is 9.6%
Took 0.89 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377074#comment-14377074
 ] 

Hadoop QA commented on TEZ-2204:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706785/TEZ-2204-4.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/333//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/333//console

This message is automatically generated.

> TestAMRecovery increasingly flaky on jenkins builds. 
> -
>
> Key: TEZ-2204
> URL: https://issues.apache.org/jira/browse/TEZ-2204
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch, 
> TEZ-2204-4.patch
>
>
> In recent pre-commit builds and daily builds, there seem to have been some 
> occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2223) TestMockDAGAppMaster fails due to TEZ-2210

2015-03-23 Thread Jeff Zhang (JIRA)

Jeff Zhang created TEZ-2223:
---

 Summary: TestMockDAGAppMaster fails due to TEZ-2210
 Key: TEZ-2223
 URL: https://issues.apache.org/jira/browse/TEZ-2223
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang


[~bikassaha] looks like TestMockDAGAppMaster fails due to TEZ-2210 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2223) TestMockDAGAppMaster fails due to TEZ-2210

2015-03-23 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2223:

Description: 
[~bikassaha] looks like TestMockDAGAppMaster fails due to TEZ-2210 
It would fail on mac due to cpuPlugin is null

  was:[~bikassaha] looks like TestMockDAGAppMaster fails due to TEZ-2210 


> TestMockDAGAppMaster fails due to TEZ-2210
> --
>
> Key: TEZ-2223
> URL: https://issues.apache.org/jira/browse/TEZ-2223
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>
> [~bikassaha] looks like TestMockDAGAppMaster fails due to TEZ-2210 
> It would fail on mac due to cpuPlugin is null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2223) TestMockDAGAppMaster fails due to TEZ-2210 on mac

2015-03-23 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2223:

Summary: TestMockDAGAppMaster fails due to TEZ-2210 on mac  (was: 
TestMockDAGAppMaster fails due to TEZ-2210)

> TestMockDAGAppMaster fails due to TEZ-2210 on mac
> -
>
> Key: TEZ-2223
> URL: https://issues.apache.org/jira/browse/TEZ-2223
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>
> [~bikassaha] looks like TestMockDAGAppMaster fails due to TEZ-2210 
> It would fail on mac due to cpuPlugin is null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377077#comment-14377077
 ] 

Bikas Saha commented on TEZ-2217:
-

Sorry. To be clear. This is with the patch attached?

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2221) VertexGroup name should be unqiue

2015-03-23 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2221:

Attachment: TEZ-2221-1.patch

> VertexGroup name should be unqiue
> -
>
> Key: TEZ-2221
> URL: https://issues.apache.org/jira/browse/TEZ-2221
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2221-1.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex 
> group name to identify the vertex group commit, the same name of vertex group 
> will conflict. While in the current equals & hashCode of VertexGroup, vertex 
> group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377083#comment-14377083
 ] 

Bikas Saha commented on TEZ-2217:
-

If its with the patch, then it would mean that the scheduler has non-empty task 
requests at that time. With the fix, can you please attach the AM logs with 
debug logging enabled for the YarnTaskSchedulerService only. Else it will have 
RPC junk in it. Thanks

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377086#comment-14377086
 ] 

Gopal V commented on TEZ-2217:
--

Yes, the LOG does not say "delay expired or is new." - which seems in the 
codepath that this patch changed.

Which is why I asked about new logging.

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377105#comment-14377105
 ] 

Bikas Saha commented on TEZ-714:


Not seen the patch yet because it may change if you agree with these comments
bq. Regarding the issue of "not sure why group-commit and non-group commit need 
to be differentiated in different transitions.
Can this be fixed by having the events for both be different? But still handled 
in the same transition. The transition can check if its a group commit event vs 
normal commit event (based on event type) - and then log for group commit. 
Maybe group commit event can derive from normal commit event. 

Is this recovery log written relevant only in the non-commit-at-end case where 
group commits can happen before the DAG finishes?

bq. Unit test is still not perfect. Because currently in the DAGImpl/VertexImpl 
we run the shared thread pool in the AsynDispatcher 
For these tests we could choose to use the normal thread pool by overriding the 
setup. Since this is a new test, it can try to not depend on ordering like the 
existing tests do. If so, then it should be fine to use the real threadpool 
instead of the fake thread pool that delegates to the dispatcher. Maybe you can 
create a new TestCommit that starts from scratch without the hacks in 
TestVertexImpl.

bq. For the some existing transition, like (RUNNING to ERROR due to INTERNAL 
ERROR)
Is this for VertexImpl or DAGImpl? That sounds like a bug. Is that relevant to 
the commit operation though?

> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377105#comment-14377105
 ] 

Bikas Saha edited comment on TEZ-714 at 3/24/15 2:05 AM:
-

Not seen the patch yet because it may change if you agree with these comments
bq. Regarding the issue of "not sure why group-commit and non-group commit need 
to be differentiated in different transitions.
Can this be fixed by having the events for both be different? But still handled 
in the same transition. The transition can check if its a group commit event vs 
normal commit event (based on event type) - and then log for group commit. 
Maybe group commit event can derive from normal commit event. IMO, having less 
transitions makes the code much simpler.

Is this recovery log written relevant only in the non-commit-at-end case where 
group commits can happen before the DAG finishes?

bq. Unit test is still not perfect. Because currently in the DAGImpl/VertexImpl 
we run the shared thread pool in the AsynDispatcher 
For these tests we could choose to use the normal thread pool by overriding the 
setup. Since this is a new test, it can try to not depend on ordering like the 
existing tests do. If so, then it should be fine to use the real threadpool 
instead of the fake thread pool that delegates to the dispatcher. Maybe you can 
create a new TestCommit that starts from scratch without the hacks in 
TestVertexImpl.

bq. For the some existing transition, like (RUNNING to ERROR due to INTERNAL 
ERROR)
Is this for VertexImpl or DAGImpl? That sounds like a bug. Is that relevant to 
the commit operation though?


was (Author: bikassaha):
Not seen the patch yet because it may change if you agree with these comments
bq. Regarding the issue of "not sure why group-commit and non-group commit need 
to be differentiated in different transitions.
Can this be fixed by having the events for both be different? But still handled 
in the same transition. The transition can check if its a group commit event vs 
normal commit event (based on event type) - and then log for group commit. 
Maybe group commit event can derive from normal commit event. 

Is this recovery log written relevant only in the non-commit-at-end case where 
group commits can happen before the DAG finishes?

bq. Unit test is still not perfect. Because currently in the DAGImpl/VertexImpl 
we run the shared thread pool in the AsynDispatcher 
For these tests we could choose to use the normal thread pool by overriding the 
setup. Since this is a new test, it can try to not depend on ordering like the 
existing tests do. If so, then it should be fine to use the real threadpool 
instead of the fake thread pool that delegates to the dispatcher. Maybe you can 
create a new TestCommit that starts from scratch without the hacks in 
TestVertexImpl.

bq. For the some existing transition, like (RUNNING to ERROR due to INTERNAL 
ERROR)
Is this for VertexImpl or DAGImpl? That sounds like a bug. Is that relevant to 
the commit operation though?

> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377115#comment-14377115
 ] 

Bikas Saha commented on TEZ-2217:
-

The existing debug logs should be enough if enabled. What is intriguing is that 
at this point in time there are pending task requests that have not already 
been matched to the containers because I am guessing that the job already has 
all the containers it will ever get. If that was not the case then it would hit 
the changed code path (AM is idle or there are no pending requests).
What is the min expiry time compared to the delays between node-rack-star 
matching? Hoping that the containers have been tried to be matched upto star 
before the min expiry elapses. So all tasks should have been matched to some 
containers leading to empty task requests.

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377118#comment-14377118
 ] 

Bikas Saha commented on TEZ-2217:
-

This may help in setting debug logs for only 1 class
{noformat}  /**
   * Root Logging level passed to the Tez app master.
   *
   * Simple configuration: Set the log level for all loggers.
   *   e.g. INFO
   *   This sets the log level to INFO for all loggers.
   *
   * Advanced configuration: Set the log level for all classes, along with a 
different level for some.
   *   e.g. DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO
   *   This sets the log level for all loggers to DEBUG, expect for the
   *   org.apache.hadoop.ipc and org.apache.hadoop.security, which are set to 
INFO
   *
   * Note: The global log level must always be the first parameter.
   *   DEBUG;org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO is valid
   *   org.apache.hadoop.ipc=INFO;org.apache.hadoop.security=INFO is not valid
   * */
  @ConfigurationScope(Scope.AM)
  public static final String TEZ_AM_LOG_LEVEL = TEZ_AM_PREFIX + "log.level";
  public static final String TEZ_AM_LOG_LEVEL_DEFAULT = "INFO";
{noformat}

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2223) TestMockDAGAppMaster fails due to TEZ-2210 on mac

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377121#comment-14377121
 ] 

Bikas Saha commented on TEZ-2223:
-

perhaps tezmxbeansresourcecalculator can be used as the default.

> TestMockDAGAppMaster fails due to TEZ-2210 on mac
> -
>
> Key: TEZ-2223
> URL: https://issues.apache.org/jira/browse/TEZ-2223
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>
> [~bikassaha] looks like TestMockDAGAppMaster fails due to TEZ-2210 
> It would fail on mac due to cpuPlugin is null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377128#comment-14377128
 ] 

Hadoop QA commented on TEZ-2217:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706789/TEZ-2217.1.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/334//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/334//console

This message is automatically generated.

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217.1.patch, TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: TEZ-2217 PreCommit Build #334

2015-03-23 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-2217
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/334/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2752 lines...]
[INFO] Final Memory: 67M/805M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706789/TEZ-2217.1.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/334//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/334//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
c119360121f6701025a88826b76dec1f3083c568 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #332
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2527096 bytes
Compression is 7.2%
Took 0.73 sec
Description set: TEZ-2217
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-2217:
-
Attachment: TEZ-2217-debug.txt.bz2

Debug logs attached.

{code}
$ grep "Releasing unused" app-log.txt | wc -l
111
{code}

I always use {{--hiveconf tez.am.log.level="INFO;=DEBUG"}}, that 
seems to have worked.

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217-debug.txt.bz2, TEZ-2217.1.patch, 
> TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-2217 PreCommit Build #337

2015-03-23 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-2217
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/337/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 31 lines...]
HEAD is now at 6d0b10a TEZ-2176. Move all logging to slf4j. Contributed by 
Vasanth kumar RJ.
Previous HEAD position was 6d0b10a... TEZ-2176. Move all logging to slf4j. 
Contributed by Vasanth kumar RJ.
Switched to branch 'master'
Your branch is behind 'origin/master' by 30 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to 6d0b10a8445d3c26b0958ce816c64b577a1608d9.
TEZ-2217 patch is being downloaded at Tue Mar 24 02:28:13 UTC 2015 from
http://issues.apache.org/jira/secure/attachment/12706809/TEZ-2217-debug.txt.bz2
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
patch:  Only garbage was found in the patch input.
The patch does not appear to apply with p0 to p2
PATCH APPLICATION FAILED




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12706809/TEZ-2217-debug.txt.bz2
  against master revision 6d0b10a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/337//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
468b2e1ce34852fa777431321e7aaa5322b885d9 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #334
Archived 7 artifacts
Archive block size is 32768
Received 0 blocks and 1408632 bytes
Compression is 0.0%
Took 0.36 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

Failed: TEZ-714 PreCommit Build #336

2015-03-23 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-714
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/336/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 1229 lines...]


  Running tests 
  /home/jenkins/tools/maven/latest/bin/mvn clean install -fn -DTezPatchProcess
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/build-tools/test-patch.sh:
 line 609: 
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/testrun.txt:
 No such file or directory
cat: 
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/testrun.txt:
 No such file or directory
awk: cannot open 
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/testrun.txt
 (No such file or directory)




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706466/TEZ-714-2.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 2.0.3) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in  

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/336//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/336//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
92eeff2a0bc0fe4afb6396a0f6663a6b640cf699 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime

2015-03-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377148#comment-14377148
 ] 

Hadoop QA commented on TEZ-2217:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12706809/TEZ-2217-debug.txt.bz2
  against master revision 6d0b10a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/337//console

This message is automatically generated.

> The min-held-containers constraint is not enforced during query runtime 
> 
>
> Key: TEZ-2217
> URL: https://issues.apache.org/jira/browse/TEZ-2217
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Gopal V
>Assignee: Bikas Saha
> Attachments: TEZ-2217-debug.txt.bz2, TEZ-2217.1.patch, 
> TEZ-2217.txt.bz2
>
>
> The min-held containers constraint is respected during query idle times, but 
> is not respected when a query is actually in motion.
> The AM releases unused containers during dag execution without checking for 
> min-held containers.
> {code}
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing 
> container, containerId=container_1424502260528_1348_01_13, 
> containerExpiryTime=1426891313264, idleTimeoutMin=5000
> 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] 
> rm.YarnTaskSchedulerService: Releasing unused container: 
> container_1424502260528_1348_01_13
> {code}
> This is actually useful only after the AM has received a soft pre-emption 
> message, doing it on an idle cluster slows down one of the most common query 
> patterns in BI systems.
> {code}
> create temporary table smalltable as ...; 
> select ... bigtable JOIN smalltable ON ...;
> {code}
> The smaller query in the beginning throws away the pre-warmed capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377150#comment-14377150
 ] 

Hadoop QA commented on TEZ-714:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706466/TEZ-714-2.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 2.0.3) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in  

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/336//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/336//console

This message is automatically generated.

> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-03-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377180#comment-14377180
 ] 

Hadoop QA commented on TEZ-2221:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706800/TEZ-2221-1.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/335//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/335//console

This message is automatically generated.

> VertexGroup name should be unqiue
> -
>
> Key: TEZ-2221
> URL: https://issues.apache.org/jira/browse/TEZ-2221
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2221-1.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex 
> group name to identify the vertex group commit, the same name of vertex group 
> will conflict. While in the current equals & hashCode of VertexGroup, vertex 
> group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: TEZ-2221 PreCommit Build #335

2015-03-23 Thread Apache Jenkins Server

Jira: https://issues.apache.org/jira/browse/TEZ-2221
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/335/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2749 lines...]
[INFO] Final Memory: 70M/973M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706800/TEZ-2221-1.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/335//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/335//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
d5affd0ff69e0697d9b68ca07d5e206cb522faa6 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #334
Archived 44 artifacts
Archive block size is 32768
Received 21 blocks and 2035862 bytes
Compression is 25.3%
Took 0.75 sec
Description set: TEZ-2221
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377209#comment-14377209
 ] 

Jeff Zhang commented on TEZ-714:


bq. Can this be fixed by having the events for both be different? But still 
handled in the same transition.
It could, but this may make the transition complicated. Currently we need to 
differentiate these 2 kinds of commits, besides there's 2 possible states 
(RUNNING, COMMITTING) when the commit happens and we also need check handle 2 
different cases (commit succeeded & failure), so there would be totally 8 
different cases in one transition which may be difficult to read.

bq. Is this recovery log written relevant only in the non-commit-at-end case 
where group commits can happen before the DAG finishes?
Yes

bq.  Maybe you can create a new TestCommit that starts from scratch without the 
hacks in TestVertexImpl.
Yes, this is I plan to do. 

bq. Is this for VertexImpl or DAGImpl? That sounds like a bug. Is that relevant 
to the commit operation though?
It is relevant to the abort. Currently in DAG's InternalErrorTransition (no 
matter what state it is ), dag would abort directly and go to ERROR state 
without waiting for vertex to finish. 


> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2097) TEZ-UI Add dag logs

2015-03-23 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2097:
-
Priority: Critical  (was: Blocker)

> TEZ-UI Add dag logs
> ---
>
> Key: TEZ-2097
> URL: https://issues.apache.org/jira/browse/TEZ-2097
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jeff Zhang
>Priority: Critical
>
> If dag fails due to AM error, there's no way to check the dag logs on tez-ui. 
> Users have to grab the app logs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2097) TEZ-UI Add dag logs

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377263#comment-14377263
 ] 

Hitesh Shah commented on TEZ-2097:
--

Downgrading to critical. 

> TEZ-UI Add dag logs
> ---
>
> Key: TEZ-2097
> URL: https://issues.apache.org/jira/browse/TEZ-2097
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jeff Zhang
>Priority: Critical
>
> If dag fails due to AM error, there's no way to check the dag logs on tez-ui. 
> Users have to grab the app logs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2097) TEZ-UI Add dag logs

2015-03-23 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2097:
-
Target Version/s: 0.6.2  (was: 0.6.1)

> TEZ-UI Add dag logs
> ---
>
> Key: TEZ-2097
> URL: https://issues.apache.org/jira/browse/TEZ-2097
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jeff Zhang
>Priority: Critical
>
> If dag fails due to AM error, there's no way to check the dag logs on tez-ui. 
> Users have to grab the app logs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2205) Tez still tries to post to ATS when yarn.timeline-service.enabled=false

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377268#comment-14377268
 ] 

Hitesh Shah commented on TEZ-2205:
--

[~rohini] [~hagleitn] any comments/concerns on the approach that we plan to 
take? 

> Tez still tries to post to ATS when yarn.timeline-service.enabled=false
> ---
>
> Key: TEZ-2205
> URL: https://issues.apache.org/jira/browse/TEZ-2205
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.6.1
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: TEZ-2205.wip.patch
>
>
> when set yarn.timeline-service.enabled=false, Tez still tries posting to ATS, 
> but hits error as token is not found. Does not fail the job because of the 
> fix to not fail job when there is error posting to ATS. But it should not be 
> trying to post to ATS in the first place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2047) Build fails against hadoop-2.2 post TEZ-2018

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377270#comment-14377270
 ] 

Hitesh Shah commented on TEZ-2047:
--

[~pramachandran] Sorry for the delay in the review. 

Comments: 

The basic change looks fine but I am not sure how we are enforcing only http ( 
no ssl ) mode with the current implemenation? The WebApps code seems to 
eventually look into the config for the yarn policy. Should the WebUIService be 
setting that up correctly to enforce http only?

> Build fails against hadoop-2.2 post TEZ-2018
> 
>
> Key: TEZ-2047
> URL: https://issues.apache.org/jira/browse/TEZ-2047
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Prakash Ramachandran
>Priority: Blocker
> Attachments: TEZ-2047.1.patch
>
>
> Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project tez-dag: Compilation failure: Compilation failure:
> [ERROR] 
> /home/jenkins/jenkins-slave/workspace/Tez-Build-Hadoop-2.2/tez-dag/src/main/java/org/apache/tez/dag/app/web/WebUIService.java:[85,13]
>  cannot find symbol
> [ERROR] symbol  : method 
> withHttpPolicy(org.apache.hadoop.conf.Configuration,org.apache.hadoop.http.HttpConfig.Policy)
> [ERROR] location: class 
> org.apache.hadoop.yarn.webapp.WebApps.Builder
> [ERROR] 
> /home/jenkins/jenkins-slave/workspace/Tez-Build-Hadoop-2.2/tez-dag/src/main/java/org/apache/tez/dag/app/web/WebUIService.java:[87,45]
>  cannot find symbol
> [ERROR] symbol  : method getConnectorAddress(int)
> [ERROR] location: class org.apache.hadoop.http.HttpServer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-986) Make conf set on DAG and vertex available in jobhistory

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377271#comment-14377271
 ] 

Hitesh Shah commented on TEZ-986:
-

Moving this out to 0.6.2. 

Not sure if [~Sreenath] has had a chance to look at this jira.

> Make conf set on DAG and vertex available in jobhistory
> ---
>
> Key: TEZ-986
> URL: https://issues.apache.org/jira/browse/TEZ-986
> Project: Apache Tez
>  Issue Type: Sub-task
>  Components: UI
>Reporter: Rohini Palaniswamy
>Priority: Blocker
>
> Would like to have the conf set on DAG and Vertex
>   1) viewable in Tez UI after the job completes. This is very essential for 
> debugging jobs.
>   2) We have processes, that parse jobconf.xml from job history (hdfs) and 
> load them into hive tables for analysis. Would like to have Tez also make all 
> the configuration (byte array) available in job history so that we can 
> similarly parse them. 1) mandates that you store it in hdfs. 2) is just to 
> say make the format stored as a contract others can rely on for parsing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-986) Make conf set on DAG and vertex available in jobhistory

2015-03-23 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-986:

Target Version/s: 0.6.2  (was: 0.6.1)

> Make conf set on DAG and vertex available in jobhistory
> ---
>
> Key: TEZ-986
> URL: https://issues.apache.org/jira/browse/TEZ-986
> Project: Apache Tez
>  Issue Type: Sub-task
>  Components: UI
>Reporter: Rohini Palaniswamy
>Priority: Blocker
>
> Would like to have the conf set on DAG and Vertex
>   1) viewable in Tez UI after the job completes. This is very essential for 
> debugging jobs.
>   2) We have processes, that parse jobconf.xml from job history (hdfs) and 
> load them into hive tables for analysis. Would like to have Tez also make all 
> the configuration (byte array) available in job history so that we can 
> similarly parse them. 1) mandates that you store it in hdfs. 2) is just to 
> say make the format stored as a contract others can rely on for parsing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2192) Relocalization does not check for source

2015-03-23 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2192:
-
Target Version/s: 0.5.4, 0.6.1  (was: 0.5.4)

> Relocalization does not check for source
> 
>
> Key: TEZ-2192
> URL: https://issues.apache.org/jira/browse/TEZ-2192
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.5.2
>Reporter: Rohini Palaniswamy
>Priority: Blocker
>
>  PIG-4443 spills the input splits to disk if serialized split size is greater 
> than some threshold. It faces issues with relocalization when more than one 
> vertex has job.split file. If a job.split file is already there on container 
> reuse, it is reused causing wrong data to be read.
> Either need a way to turn off relocalization or  check the source+timestamp 
> and redownload the file during relocalization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377280#comment-14377280
 ] 

Hitesh Shah commented on TEZ-1421:
--

[~ozawa] In that case ( given that the solution seems to non-trivial), I think 
we can move the target version to 0.7.0 given that not many other folks have 
reported this issue. Agree?

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-03-23 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated TEZ-1421:

Comment: was deleted

(was: [~hitesh] Yes, I agree with you.)

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-03-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377293#comment-14377293
 ] 

Tsuyoshi Ozawa commented on TEZ-1421:
-

[~hitesh] Yes, I agree with you.

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-03-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377292#comment-14377292
 ] 

Tsuyoshi Ozawa commented on TEZ-1421:
-

[~hitesh] Yes, I agree with you.

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1909) Remove need to copy over all events from attempt 1 to attempt 2 dir

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377297#comment-14377297
 ] 

Hitesh Shah commented on TEZ-1909:
--

Comments:

{code}
LOG.warn("Other recovery files will be skipped due to error in the previous 
recovery file");
{code}
  - please add the file name to this line as well as its length 

For TEZ_AM_RECOVERY_HANDLE_REMAINING_EVENT_WHEN_STOPPED, maybe change to 
TEZ_TEST_... and likewise change property value. No scope defined? 

It seems like the patch for this jira has been merged with fixes for a 
different jira? Can these be separated out? 




> Remove need to copy over all events from attempt 1 to attempt 2 dir
> ---
>
> Key: TEZ-1909
> URL: https://issues.apache.org/jira/browse/TEZ-1909
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch, TEZ-1909-3.patch
>
>
> Use of file versions should prevent the need for copying over data into a 
> second attempt dir. Care needs to be taken to handle "last corrupt record" 
> handling. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-1909) Remove need to copy over all events from attempt 1 to attempt 2 dir

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377297#comment-14377297
 ] 

Hitesh Shah edited comment on TEZ-1909 at 3/24/15 5:07 AM:
---

Comments:

{code}
LOG.warn("Other recovery files will be skipped due to error in the previous 
recovery file");
{code}
  - please add the file name to this log line

For TEZ_AM_RECOVERY_HANDLE_REMAINING_EVENT_WHEN_STOPPED, maybe change to 
TEZ_TEST_... and likewise change property value. No scope defined? 

It seems like the patch for this jira has been merged with fixes for a 
different jira? Can these be separated out? 





was (Author: hitesh):
Comments:

{code}
LOG.warn("Other recovery files will be skipped due to error in the previous 
recovery file");
{code}
  - please add the file name to this line as well as its length 

For TEZ_AM_RECOVERY_HANDLE_REMAINING_EVENT_WHEN_STOPPED, maybe change to 
TEZ_TEST_... and likewise change property value. No scope defined? 

It seems like the patch for this jira has been merged with fixes for a 
different jira? Can these be separated out? 




> Remove need to copy over all events from attempt 1 to attempt 2 dir
> ---
>
> Key: TEZ-1909
> URL: https://issues.apache.org/jira/browse/TEZ-1909
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch, TEZ-1909-3.patch
>
>
> Use of file versions should prevent the need for copying over data into a 
> second attempt dir. Care needs to be taken to handle "last corrupt record" 
> handling. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377301#comment-14377301
 ] 

Hitesh Shah commented on TEZ-2221:
--

what happens if someone does the following:

{code}
dag.createVertexGroup("group_1", v1,v2);
dag.createVertexGroup("group_2", v1,v2);
{code}

This should also be disallowed. Correct?

> VertexGroup name should be unqiue
> -
>
> Key: TEZ-2221
> URL: https://issues.apache.org/jira/browse/TEZ-2221
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2221-1.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex 
> group name to identify the vertex group commit, the same name of vertex group 
> will conflict. While in the current equals & hashCode of VertexGroup, vertex 
> group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-03-23 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1421:
-
Priority: Critical  (was: Blocker)

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Critical
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch

2015-03-23 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1421:
-
Target Version/s: 0.7.0  (was: 0.6.1)

> MRCombiner throws NPE in MapredWordCount on master branch
> -
>
> Key: TEZ-1421
> URL: https://issues.apache.org/jira/browse/TEZ-1421
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
>
> I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
> Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
> correctly.
> {quote}
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
> at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
> at 
> org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2204:
-
Target Version/s: 0.7.0  (was: 0.5.4)

> TestAMRecovery increasingly flaky on jenkins builds. 
> -
>
> Key: TEZ-2204
> URL: https://issues.apache.org/jira/browse/TEZ-2204
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch, 
> TEZ-2204-4.patch
>
>
> In recent pre-commit builds and daily builds, there seem to have been some 
> occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377310#comment-14377310
 ] 

Hitesh Shah commented on TEZ-2204:
--

Comments:

{code}
// don't handle events if DAGAppMaster is in the state of STOPPED,
720   // otherwise there may be dead-lock happen.  TEZ-2204
721   if (DAGAppMaster.this.getServiceState() == STATE.STOPPED) {
722 return;
723   }
{code}

Can you add a log message to identify what events are being received after the 
AM is stopped? 

+1 after the above comment is addressed. 

> TestAMRecovery increasingly flaky on jenkins builds. 
> -
>
> Key: TEZ-2204
> URL: https://issues.apache.org/jira/browse/TEZ-2204
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch, 
> TEZ-2204-4.patch
>
>
> In recent pre-commit builds and daily builds, there seem to have been some 
> occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-03-23 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377323#comment-14377323
 ] 

Jeff Zhang commented on TEZ-2221:
-

bq. This should also be disallowed. Correct?
Yes, it is not allowed. 

> VertexGroup name should be unqiue
> -
>
> Key: TEZ-2221
> URL: https://issues.apache.org/jira/browse/TEZ-2221
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2221-1.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex 
> group name to identify the vertex group commit, the same name of vertex group 
> will conflict. While in the current equals & hashCode of VertexGroup, vertex 
> group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-03-23 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377327#comment-14377327
 ] 

Hitesh Shah commented on TEZ-2221:
--

Sorry - should have clarified. The test is being changed to not test that 
condition.

> VertexGroup name should be unqiue
> -
>
> Key: TEZ-2221
> URL: https://issues.apache.org/jira/browse/TEZ-2221
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2221-1.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex 
> group name to identify the vertex group commit, the same name of vertex group 
> will conflict. While in the current equals & hashCode of VertexGroup, vertex 
> group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue

2015-03-23 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377330#comment-14377330
 ] 

Jeff Zhang commented on TEZ-2221:
-

In the previous testcase we compare vertex group by using both group_name and 
members, I change the the test case to indicate that now we only compare with 
group name.


> VertexGroup name should be unqiue
> -
>
> Key: TEZ-2221
> URL: https://issues.apache.org/jira/browse/TEZ-2221
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2221-1.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex 
> group name to identify the vertex group commit, the same name of vertex group 
> will conflict. While in the current equals & hashCode of VertexGroup, vertex 
> group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377401#comment-14377401
 ] 

Bikas Saha commented on TEZ-714:


bq. It could, but this may make the transition complicated. Currently we need 
to differentiate these 2 kinds of commits, besides there's 2 possible states 
(RUNNING, COMMITTING) when the commit happens and we also need check handle 2 
different cases (commit succeeded & failure), so there would be totally 8 
different cases in one transition which may be difficult to read.
I am looking at TaskAttemptImpl#TerminatedBeforeRunningTransition state 
transitions as inspiration. There are some standard things to do when a commit 
operation completes. e.g. decrement  the outstanding commit counter. If commit 
was a group commit then write the recovery entry for it. If the commit fails 
then set a flag to abort. This can be in a base transition say 
CommitCompletedTransition. Then we can have 
CommitCompletedWhileRunningTransition that calls the base for common code and 
does running specific stuff.e.g. trigger job failure upon commit failure. And 
another transition for CommitCompletedWhileCommitting that just waits for the 
commit counter to drop to 0. Next, CommitCompletedWhileTerminating which waits 
for all commit operations to complete and then calls abort (this could be 
blocking for now). 
Perhaps, all commit events need to have a shared boolean that they should check 
before invoking commit. This boolean could be set to false when the vertex/dag 
decides to abort. This would make and pending commit operations complete 
quickly instead of trying to commit unnecessarily.
Some e2e scenarios could be tested via simulation using the MockDAGAppMaster. 
Create custom committers that fail/pass as desired and check that the dag 
behaved as expected.

> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-03-23 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377401#comment-14377401
 ] 

Bikas Saha edited comment on TEZ-714 at 3/24/15 6:54 AM:
-

bq. It could, but this may make the transition complicated. Currently we need 
to differentiate these 2 kinds of commits, besides there's 2 possible states 
(RUNNING, COMMITTING) when the commit happens and we also need check handle 2 
different cases (commit succeeded & failure), so there would be totally 8 
different cases in one transition which may be difficult to read.
I am looking at TaskAttemptImpl#TerminatedBeforeRunningTransition state 
transitions as inspiration. There are some standard things to do when a commit 
operation completes. e.g. decrement  the outstanding commit counter. If commit 
was a group commit then write the recovery entry for it. If the commit fails 
then set a flag to abort. This can be in a base transition say 
CommitCompletedTransition. Then we can have 
CommitCompletedWhileRunningTransition that calls the base for common code and 
does running specific stuff.e.g. trigger job failure upon commit failure. And 
another transition for CommitCompletedWhileCommitting that just waits for the 
commit counter to drop to 0. Next, CommitCompletedWhileTerminating which waits 
for all commit operations to complete and then calls abort (this could be 
blocking for now). This way we can separate things while still keeping the 
transitions essentially linear. Instead of multiplying the possibilities by (2 
commit types x 3 states x 2 commit results)
Perhaps, all commit events need to have a shared boolean that they should check 
before invoking commit. This boolean could be set to false when the vertex/dag 
decides to abort. This would make and pending commit operations complete 
quickly instead of trying to commit unnecessarily.
Some e2e scenarios could be tested via simulation using the MockDAGAppMaster. 
Create custom committers that fail/pass as desired and check that the dag 
behaved as expected.


was (Author: bikassaha):
bq. It could, but this may make the transition complicated. Currently we need 
to differentiate these 2 kinds of commits, besides there's 2 possible states 
(RUNNING, COMMITTING) when the commit happens and we also need check handle 2 
different cases (commit succeeded & failure), so there would be totally 8 
different cases in one transition which may be difficult to read.
I am looking at TaskAttemptImpl#TerminatedBeforeRunningTransition state 
transitions as inspiration. There are some standard things to do when a commit 
operation completes. e.g. decrement  the outstanding commit counter. If commit 
was a group commit then write the recovery entry for it. If the commit fails 
then set a flag to abort. This can be in a base transition say 
CommitCompletedTransition. Then we can have 
CommitCompletedWhileRunningTransition that calls the base for common code and 
does running specific stuff.e.g. trigger job failure upon commit failure. And 
another transition for CommitCompletedWhileCommitting that just waits for the 
commit counter to drop to 0. Next, CommitCompletedWhileTerminating which waits 
for all commit operations to complete and then calls abort (this could be 
blocking for now). 
Perhaps, all commit events need to have a shared boolean that they should check 
before invoking commit. This boolean could be set to false when the vertex/dag 
decides to abort. This would make and pending commit operations complete 
quickly instead of trying to commit unnecessarily.
Some e2e scenarios could be tested via simulation using the MockDAGAppMaster. 
Create custom committers that fail/pass as desired and check that the dag 
behaved as expected.

> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

76 matches

Mail list logo