[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-07-01 Thread Lalitha Viswanathan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359844#comment-15359844
 ] 

Lalitha Viswanathan commented on TEZ-3206:
--

Tried adding the available patch, in 3303 jira. Didn't see the change. Will 
wait for the jira to be resolved. Thanks.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.9.0
>
> Attachments: TEZ-3206-2.patch, TEZ-3206-3.patch, TEZ-3206-4.patch, 
> TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-07-01 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359819#comment-15359819
 ] 

Ming Ma commented on TEZ-3206:
--

[~lmv], you might need to wait for TEZ-3303.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.9.0
>
> Attachments: TEZ-3206-2.patch, TEZ-3206-3.patch, TEZ-3206-4.patch, 
> TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-07-01 Thread Lalitha Viswanathan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359810#comment-15359810
 ] 

Lalitha Viswanathan commented on TEZ-3206:
--

Hi, I wanted hive.tez.auto.reducer.parallelism=true feature, in shuffle hash 
join (hive.optimize.dynamic.partition.hashjoin=true). Cloned the master branch, 
compiled and deployed the 0.9-SNAPSHOT binaries of tez, and re-tried. But I 
don't see the feature taking effect. Am I missing something? Thanks!

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.9.0
>
> Attachments: TEZ-3206-2.patch, TEZ-3206-3.patch, TEZ-3206-4.patch, 
> TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-05-23 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297272#comment-15297272
 ] 

TezQA commented on TEZ-3206:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12805747/TEZ-3206-4.patch
  against master revision cc68f7b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1747//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1747//console

This message is automatically generated.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.9.0
>
> Attachments: TEZ-3206-2.patch, TEZ-3206-3.patch, TEZ-3206-4.patch, 
> TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-05-23 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297144#comment-15297144
 ] 

Ming Ma commented on TEZ-3206:
--

Thanks [~sseth]!

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.9.0
>
> Attachments: TEZ-3206-2.patch, TEZ-3206-3.patch, TEZ-3206-4.patch, 
> TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-05-23 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297116#comment-15297116
 ] 

TezQA commented on TEZ-3206:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12805730/TEZ-3206-3.patch
  against master revision cc68f7b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1746//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1746//console

This message is automatically generated.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3206-2.patch, TEZ-3206-3.patch, TEZ-3206-4.patch, 
> TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-05-23 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297080#comment-15297080
 ] 

Siddharth Seth commented on TEZ-3206:
-

+1. Thanks [~mingma] - Will commit once the pre-commit build completes. qq. 
Does the timeout on the test have to be that large - 10. Could it maybe be 
reduced to 30-4 ?

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3206-2.patch, TEZ-3206-3.patch, TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-05-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294021#comment-15294021
 ] 

Siddharth Seth commented on TEZ-3206:
-

I don't think the VertexManager is setup to handle multiple VertexManagerEvents 
per task. In the pipelined shuffle case - the ordered output sends out a single 
VertexManagerEvent only if it's the final event being sent out. We'll likely 
need to do the same here.
Other than that - the patch looks good to me - simpler than the first patch :)

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3206-2.patch, TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-05-12 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281969#comment-15281969
 ] 

TezQA commented on TEZ-3206:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12803714/TEZ-3206-2.patch
  against master revision f70aa17.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1719//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1719//console

This message is automatically generated.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3206-2.patch, TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-27 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261573#comment-15261573
 ] 

Siddharth Seth commented on TEZ-3206:
-

Comments on the patch
- In SpillCallback.onSuccess - 
updateGlobalSizePerPartition(result.wrappedBuffer) is invoked after the 
wrappedBuffer has been reset.
- When a buffer is not being used (e.g. single partition) - I think it'll be 
better to not set the size in a wrappedBuffer instance. (With single partition, 
no buffers - we should ideally not even have created the wrappedBuffer - this 
becomes tougher to fix if the size stats are always set in wrappedBuffer). 
Instead, updateGlobalSizePerPartition could just accept a long array - which 
comes from the buffer or is setup explicitly.
- An additional test for the pipelined case would be useful.

When using this, one thing to note would be the possibility of repetition of 
data from the same task in case of retries.

This ends up with the estimates. I'm not sure how much difference real sizes 
will make in the use case you are targeting, but that could be an option - send 
estimates / send real sizes. The VMEvent could be modified to indicate which 
one is being sent. 


> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-19 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248748#comment-15248748
 ] 

TezQA commented on TEZ-3206:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12799570/TEZ-3206.patch
  against master revision 53aa661.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1663//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1663//console

This message is automatically generated.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-18 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246517#comment-15246517
 ] 

Siddharth Seth commented on TEZ-3206:
-

As has been pointed out here, the impact of 4bytes per message is a lot higher 
on the AM. All Sources * 4 bytes * #numPartitions is what the AM will end up 
requiring since it stores all the events.

bq. RoaringBitmap isn't accurate, but it seems good enough for the 
auto-parallelism. But it doesn't work well for data routing that depends on 
more accurate partition stats.
[~mingma] - Is RoadingBitmaps itself inaccurate, or the way we attempt to make 
use of fewer bits which is inherently lossy ?

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-15 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243966#comment-15243966
 ] 

Jonathan Eagles commented on TEZ-3206:
--

\\cc [~jlowe].

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-15 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243956#comment-15243956
 ] 

Ming Ma commented on TEZ-3206:
--

Another thing for the DataMovementEvent is its impact on AM memory. it appears 
the reducer vertex stores DMEs and other events from mappers. So for the 
auto-parallelism case where it has many empty partitions, the overall event 
buffer size could be large.

In addition, DME's empty partition payload has all empty partitions from a 
specific mapper. At the reducer side, it only cares about the partitions it is 
responsible for, not partitions for other reducers. Perhaps we can optimize AM 
to send only the relevant empty partitions to the reducer. Also for the 
auto-parallelism case, instead of sending one DME to the reducer at a time, it 
can send batch events similar to {{CompositeDataMovementEvent}}, useful for the 
auto-parallelism case.

Anyway, there are couple issues discussed here. I will use this jira to add the 
VertexManagerEvent routing from {{UnorderedPartitionedKVWriter}} to Tez AM. We 
can discuss the followings in other jiras:

* Use more precise partition stats in VertexManagerEvent.
* Optimize DME routing between AM and downstream vertex.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-14 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242242#comment-15242242
 ] 

Ming Ma commented on TEZ-3206:
--

Thanks [~jeagles]! AFAIK, partition statistics is via VertexManagerEvent which 
stops at AM, while empty partition list is via DataMovementEvent and routed to 
AM then reducers. So the size of partition statistics shouldn't impact any 
reducer.

For the DataMovementEvent task OOM case, is it because each reducer gets 
launched after all 100k mappers have finished and thus fills up its event 
queue? I assume the same thing could happen to AM, e.g., VertexManagerEvent 
explode AM's event queue. Although that is less likely, as it requires all 100k 
mappers finish at the same time or at faster rate than AM async dispatcher can 
process.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-12 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238664#comment-15238664
 ] 

Jonathan Eagles commented on TEZ-3206:
--

I have to look more closely at how the partition stats is routed, but rpc 
message size in DataMovementEvents that is routed to downstream tasks is highly 
sensitive. Some typical jobs I have seen may send 100,000 DMEs or more to each 
reducer. 80KB per message will OOM the task.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-12 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238571#comment-15238571
 ] 

Ming Ma commented on TEZ-3206:
--

Current implementation in ordered partitioned KV output uses RoaringBitmap for 
rough estimate of the partition size. Is this optimization necessary? a) 
VertexManagerEvent is sent when it spills or output is closed so the frequency 
is relatively low. b) Assume we don't use bitmap and instead use 4 bytes for 
each partition size and there are 20k reducers, that is 80KB in size, not large 
for RPC.

RoaringBitmap isn't accurate, but it seems good enough for the 
auto-parallelism. But it doesn't work well for data routing that depends on 
more accurate partition stats.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

2016-04-08 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233194#comment-15233194
 ] 

Hitesh Shah commented on TEZ-3206:
--

\cc [~rajesh.balamohan] [~sseth]

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> -
>
> Key: TEZ-3206
> URL: https://issues.apache.org/jira/browse/TEZ-3206
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)