[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown

2018-08-27 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594400#comment-16594400
 ] 

Gunther Hagleitner commented on TEZ-3980:
-

+1

> ShuffleRunner: the wake loop needs to check for shutdown
> 
>
> Key: TEZ-3980
> URL: https://issues.apache.org/jira/browse/TEZ-3980
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3980.1.patch
>
>
> In the ShuffleRunner threads, there's a loop which does not terminate if the 
> task threads get killed.
> {code}
>   while ((runningFetchers.size() >= numFetchers || 
> pendingHosts.isEmpty())
>   && numCompletedInputs.get() < numInputs) {
> inputContext.notifyProgress();
> boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
>   }
> {code}
> The wakeLoop signal does not exit this out of the loop and is missing a break 
> for shut-down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown

2018-08-22 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589304#comment-16589304
 ] 

Sergey Shelukhin commented on TEZ-3980:
---

+1 non-binding

> ShuffleRunner: the wake loop needs to check for shutdown
> 
>
> Key: TEZ-3980
> URL: https://issues.apache.org/jira/browse/TEZ-3980
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3980.1.patch
>
>
> In the ShuffleRunner threads, there's a loop which does not terminate if the 
> task threads get killed.
> {code}
>   while ((runningFetchers.size() >= numFetchers || 
> pendingHosts.isEmpty())
>   && numCompletedInputs.get() < numInputs) {
> inputContext.notifyProgress();
> boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
>   }
> {code}
> The wakeLoop signal does not exit this out of the loop and is missing a break 
> for shut-down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown

2018-08-16 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582757#comment-16582757
 ] 

Gopal V commented on TEZ-3980:
--

Testing issues with LLAP task pre-emption.

When reducers doing the unsorted shuffle join (or the bloom filter semi-join) 
are pre-empted, they leave behind a shuffle runner thread.

After 32k threads leak, this fails with a "cannot create thread" in some other 
random IPC thread.

> ShuffleRunner: the wake loop needs to check for shutdown
> 
>
> Key: TEZ-3980
> URL: https://issues.apache.org/jira/browse/TEZ-3980
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3980.1.patch
>
>
> In the ShuffleRunner threads, there's a loop which does not terminate if the 
> task threads get killed.
> {code}
>   while ((runningFetchers.size() >= numFetchers || 
> pendingHosts.isEmpty())
>   && numCompletedInputs.get() < numInputs) {
> inputContext.notifyProgress();
> boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
>   }
> {code}
> The wakeLoop signal does not exit this out of the loop and is missing a break 
> for shut-down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown

2018-08-16 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582752#comment-16582752
 ] 

Gopal V commented on TEZ-3980:
--

The shufflescheduler has a check for shutdown.get() + a break inside the loop 
(also uses thread wait). This is a shufflemanager only bug right now.

> ShuffleRunner: the wake loop needs to check for shutdown
> 
>
> Key: TEZ-3980
> URL: https://issues.apache.org/jira/browse/TEZ-3980
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3980.1.patch
>
>
> In the ShuffleRunner threads, there's a loop which does not terminate if the 
> task threads get killed.
> {code}
>   while ((runningFetchers.size() >= numFetchers || 
> pendingHosts.isEmpty())
>   && numCompletedInputs.get() < numInputs) {
> inputContext.notifyProgress();
> boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
>   }
> {code}
> The wakeLoop signal does not exit this out of the loop and is missing a break 
> for shut-down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown

2018-08-16 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582709#comment-16582709
 ] 

Kuhu Shukla commented on TEZ-3980:
--

[~gopalv], Just curious how you encountered this issue? Did it cause a hang? 
Any details would be valuable as we are investigating some other bugs in and 
around that code base at the moment.

> ShuffleRunner: the wake loop needs to check for shutdown
> 
>
> Key: TEZ-3980
> URL: https://issues.apache.org/jira/browse/TEZ-3980
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3980.1.patch
>
>
> In the ShuffleRunner threads, there's a loop which does not terminate if the 
> task threads get killed.
> {code}
>   while ((runningFetchers.size() >= numFetchers || 
> pendingHosts.isEmpty())
>   && numCompletedInputs.get() < numInputs) {
> inputContext.notifyProgress();
> boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
>   }
> {code}
> The wakeLoop signal does not exit this out of the loop and is missing a break 
> for shut-down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown

2018-08-16 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582598#comment-16582598
 ] 

Kuhu Shukla commented on TEZ-3980:
--

Good catch [~gopalv].Do we need an equivalent change in ShuffleScheduler as 
well? (The ordered case)

> ShuffleRunner: the wake loop needs to check for shutdown
> 
>
> Key: TEZ-3980
> URL: https://issues.apache.org/jira/browse/TEZ-3980
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3980.1.patch
>
>
> In the ShuffleRunner threads, there's a loop which does not terminate if the 
> task threads get killed.
> {code}
>   while ((runningFetchers.size() >= numFetchers || 
> pendingHosts.isEmpty())
>   && numCompletedInputs.get() < numInputs) {
> inputContext.notifyProgress();
> boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
>   }
> {code}
> The wakeLoop signal does not exit this out of the loop and is missing a break 
> for shut-down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown

2018-08-16 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582037#comment-16582037
 ] 

TezQA commented on TEZ-3980:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12935798/TEZ-3980.1.patch
  against master revision 90c8195.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2892//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2892//console

This message is automatically generated.


> ShuffleRunner: the wake loop needs to check for shutdown
> 
>
> Key: TEZ-3980
> URL: https://issues.apache.org/jira/browse/TEZ-3980
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3980.1.patch
>
>
> In the ShuffleRunner threads, there's a loop which does not terminate if the 
> task threads get killed.
> {code}
>   while ((runningFetchers.size() >= numFetchers || 
> pendingHosts.isEmpty())
>   && numCompletedInputs.get() < numInputs) {
> inputContext.notifyProgress();
> boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
>   }
> {code}
> The wakeLoop signal does not exit this out of the loop and is missing a break 
> for shut-down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)