[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Description: 
Currently tasks (belonging to same job) running in the same machine download 
its own copy of broadcast data.  Optimization could be to  download one copy in 
the machine, and the rest of the tasks can refer to this downloaded copy.

(results after this feature)

!connections.png! 

!latency.png!

  was:Currently tasks (belonging to same job) running in the same machine 
download its own copy of broadcast data.  Optimization could be to  download 
one copy in the machine, and the rest of the tasks can refer to this downloaded 
copy.


> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Fix For: 0.6.0
>
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.10.patch, 
> TEZ-1157.3.WIP.patch, TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, 
> TEZ-1157.6.patch, TEZ-1157.7.patch, TEZ-1157.8.patch, TEZ-1157.9.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch, connections.png, latency.png
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.
> (results after this feature)
> !connections.png! 
> !latency.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.10.patch

Fix TeztTezRuntimeConfiguration failures for missing keys.

This is the patch for commit.

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.10.patch, 
> TEZ-1157.3.WIP.patch, TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, 
> TEZ-1157.6.patch, TEZ-1157.7.patch, TEZ-1157.8.patch, TEZ-1157.9.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch, connections.png, latency.png
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: connections.png
latency.png

On a cluster with 2700 independent slots, this patch produced performance 
improvements which at the top end reduces a 38 minute job into a 2 minute job.

Total connections established during runtime (smaller better)

!connections.png!

Total latency observed for entire DAG

!latency.png!

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, TEZ-1157.6.patch, 
> TEZ-1157.7.patch, TEZ-1157.8.patch, TEZ-1157.9.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch, connections.png, latency.png
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.9.patch

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, TEZ-1157.6.patch, 
> TEZ-1157.7.patch, TEZ-1157.8.patch, TEZ-1157.9.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.8.patch

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, TEZ-1157.6.patch, 
> TEZ-1157.7.patch, TEZ-1157.8.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-17 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.7.patch

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, TEZ-1157.6.patch, 
> TEZ-1157.7.patch, TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-09-10 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-
Attachment: TEZ-1157.6.patch

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, TEZ-1157.6.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-08-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-

Attachment: TEZ-1157.5.WIP.patch

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-08-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-

Attachment: TEZ-1157.4.WIP.patch

Next WIP patch for review.

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-1157.4.WIP.patch, TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1157) Optimize broadcast :- Tasks pertaining to same job in same machine should not download multiple copies of broadcast data

2014-08-25 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1157:
-

Attachment: TEZ-1157.3.WIP.patch

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> 
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>  Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)