[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176511#comment-15176511
 ] 

TezQA commented on TEZ-3115:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791002/TEZ-3115.4.patch
  against master revision ac0fd8b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1536//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1536//console

This message is automatically generated.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch, 
> TEZ-3115.4.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176472#comment-15176472
 ] 

TezQA commented on TEZ-3115:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791002/TEZ-3115.4.patch
  against master revision ac0fd8b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1535//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1535//console

This message is automatically generated.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch, 
> TEZ-3115.4.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176305#comment-15176305
 ] 

Siddharth Seth commented on TEZ-3115:
-

+1. Thanks [~jeagles]
Wasn't aware of the improvements to interning in Java7 etc. I supposed either 
can be used in that case..

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch, 
> TEZ-3115.4.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-01 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174281#comment-15174281
 ] 

Siddharth Seth commented on TEZ-3115:
-

I think the interning needs to be done via StringInterner.weakIntern() ?

The rest looks good. Minor, could you please add a toString method on thenew 
classes - HostPort, PathPartition, HostPortPartition

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-02-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172321#comment-15172321
 ] 

TezQA commented on TEZ-3115:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12790511/TEZ-3115.3.patch
  against master revision 18398c8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1531//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1531//console

This message is automatically generated.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-02-29 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171663#comment-15171663
 ] 

Rajesh Balamohan commented on TEZ-3115:
---

Minor:  
-Should FetcherOrderedGrouped directly call 
ShuffleUtils.constructBaseURIForShuffleHandler instead of going via another 
redirection (to avoid additional string creation in for host + ":" + port)


> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-02-26 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169463#comment-15169463
 ] 

Jonathan Eagles commented on TEZ-3115:
--

[~sseth], can you have a review of this patch? The finbugs warnings are due to 
TEZ-1911 and TEZ-3077. javac warning is expected for this scenario.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-02-26 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169414#comment-15169414
 ] 

TezQA commented on TEZ-3115:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12790149/TEZ-3115.2.patch
  against master revision 923f7b4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 31 javac 
compiler warnings (more than the master's current 30 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1520//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1520//artifact/patchprocess/newPatchFindbugsWarningstez-api.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1520//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1520//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1520//console

This message is automatically generated.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-02-26 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169266#comment-15169266
 ] 

Jonathan Eagles commented on TEZ-3115:
--

Patch 2 summary.

- Host and attempt are now the fundamental storage types. Created several 
subtypes that allow us to intern host and path component immediately after 
processing the DataMovementEvent. This allows us to not only reduce down to one 
copy not only exact strings, but the string derivatives (host -> host, 
host-port, host-port-partition), (path component -> path component, path 
component-partition). There are a few non-string handling scenarios that still 
need improvements (extremely large auto-reduce parallelism, and large number of 
empty partitions). Filed TEZ-3144 and TEZ-3145 to address those scenarios.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-02-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149130#comment-15149130
 ] 

Siddharth Seth commented on TEZ-3115:
-

Some strings which could be interned
- Intern pathComponent (ShuffleInputEventHandler, InputAttemptIdentifier, 
MapHost, etc)
- Intern hostIdentifier in MapHost (and wherever else is is created). Can 
potentially avoid storing this - however, it doesn't seem like there's an 
explosion of strings here - since it's just host:port
- Intern the hostname

The biggest offender will however continue to be host:port:partition when 
reduce parallelism kicks in. That should not be linked to the host in any way - 
however, I think that change should be in a separate jira - since it affects 
functionality quite a bit.
Another side affect of using the partitionId to identify the host is that we 
can end up with multiple parallel fetches from the same host - which is 
otherwise explicitly avoided in the Ordered Shuffle. That could be leading to 
overloaded nodes as well.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
> Attachments: TEZ-3115.1.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-02-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142893#comment-15142893
 ] 

Jason Lowe commented on TEZ-3115:
-

When auto-parallelism kicks in we're going to see many copies of the same 
upstream task attempt IDs, host:port, etc.  We should at least consider 
interning or otherwise sharing these, or potentially just storing the raw ID 
and generating the string when necessary on-the-fly.  MapHost is another 
example of many redundancies, since it stores the fully qualified host name and 
port at least three times (as part of baseUrl, identifier, and hostIdentifier). 
 I wonder if it would be better overall to have MapHost be more efficiently 
stored and generate the URLs and identifiers on-demand.


> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)