[jira] [Issue Comment Deleted] (TEZ-4067) Tez Speculation decision is calculated on each update by the dispatcher

2019-05-30 Thread Ahmed Hussein (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated TEZ-4067:
---
Comment: was deleted

(was: TEZ-1897)

> Tez Speculation decision is calculated on each update by the dispatcher
> ---
>
> Key: TEZ-4067
> URL: https://issues.apache.org/jira/browse/TEZ-4067
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch
>
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are 
> handled synchronously by the caller (dispatcher). This implies the following:
>  # the dispatcher spends long time executing updateStatus as it needs to 
> check the runtime estimation of the tezAttempts within the vertex.
>  # the speculator is per stage: lunching a speculation may not the optimum 
> decision. Ideally, based on resources, speculated tasks should be the ones 
> with slowest progress.
>  # the time between speculation is skewed because there is a big delay for 
> the dispatcher to complete a full cycle. Also, speculation will be more 
> aggressive compared to MR because MR waits for 
> "soonest.retry.after.speculate" whenever a task is speculated. On the other 
> hand, Tez speculates more tasks as it processes stages in parallel.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4067) Tez Speculation decision is calculated on each update by the dispatcher

2019-05-30 Thread Ahmed Hussein (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852045#comment-16852045
 ] 

Ahmed Hussein commented on TEZ-4067:


TEZ-1897

> Tez Speculation decision is calculated on each update by the dispatcher
> ---
>
> Key: TEZ-4067
> URL: https://issues.apache.org/jira/browse/TEZ-4067
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch
>
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are 
> handled synchronously by the caller (dispatcher). This implies the following:
>  # the dispatcher spends long time executing updateStatus as it needs to 
> check the runtime estimation of the tezAttempts within the vertex.
>  # the speculator is per stage: lunching a speculation may not the optimum 
> decision. Ideally, based on resources, speculated tasks should be the ones 
> with slowest progress.
>  # the time between speculation is skewed because there is a big delay for 
> the dispatcher to complete a full cycle. Also, speculation will be more 
> aggressive compared to MR because MR waits for 
> "soonest.retry.after.speculate" whenever a task is speculated. On the other 
> hand, Tez speculates more tasks as it processes stages in parallel.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size

2019-05-30 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851545#comment-16851545
 ] 

Gopal V edited comment on TEZ-4073 at 5/30/19 6:07 AM:
---

Async dispatcher CPU is mostly spent on the protobuf codepaths.

The AM side shows hotspots in places like

{code}
public VertexManagerPluginDescriptor build() {
  VertexManagerPluginDescriptor desc =
  VertexManagerPluginDescriptor.create(
  RootInputVertexManager.class.getName());

  try {
return desc.setUserPayload(TezUtils
.createUserPayloadFromConf(this.conf));
  } catch (IOException e) {
throw new TezUncheckedException(e);
  }
}
{code}


was (Author: gopalv):
Async dispatcher CPU is mostly spent on the protobuf codepaths.

> Configuration: Reduce Vertex and DAG Payload Size
> -
>
> Key: TEZ-4073
> URL: https://issues.apache.org/jira/browse/TEZ-4073
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
> Attachments: tez-am-protobuf-reading.png, tez-protobuf-writing.png
>
>
> As the total number of vertices go up, the Tez protobuf transport starts to 
> show up as a potential scalability problem for the task submission and the AM
> {code}
> public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, 
> String[] localDirs,
>  ...
> this.taskConf = new Configuration(tezConf);
> if (taskSpec.getTaskConf() != null) {
>   Iterator> iter = 
> taskSpec.getTaskConf().iterator();
>   while (iter.hasNext()) {
> Entry entry = iter.next();
> taskConf.set(entry.getKey(), entry.getValue());
>   }
> }
> {code}
> The TaskSpec getTaskConf() need not include any of the default configs, since 
> the keys are placed into an existing task conf.
> {code}
> // Security framework already loaded the tokens into current ugi
> DAGProtos.ConfigurationProto confProto =
> 
> TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
> TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, 
> confProto.getConfKeyValuesList());
> UserGroupInformation.setConfiguration(defaultConf);
> Credentials credentials = 
> UserGroupInformation.getCurrentUser().getCredentials();
> {code}
> At the very least, the DAG and Vertex do not both need to have the same 
> configs repeated in them.
>  !tez-protobuf-writing.png! 
> +
>  !tez-am-protobuf-reading.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size

2019-05-30 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851545#comment-16851545
 ] 

Gopal V commented on TEZ-4073:
--

Async dispatcher CPU is mostly spent on the protobuf codepaths.

> Configuration: Reduce Vertex and DAG Payload Size
> -
>
> Key: TEZ-4073
> URL: https://issues.apache.org/jira/browse/TEZ-4073
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
> Attachments: tez-am-protobuf-reading.png, tez-protobuf-writing.png
>
>
> As the total number of vertices go up, the Tez protobuf transport starts to 
> show up as a potential scalability problem for the task submission and the AM
> {code}
> public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, 
> String[] localDirs,
>  ...
> this.taskConf = new Configuration(tezConf);
> if (taskSpec.getTaskConf() != null) {
>   Iterator> iter = 
> taskSpec.getTaskConf().iterator();
>   while (iter.hasNext()) {
> Entry entry = iter.next();
> taskConf.set(entry.getKey(), entry.getValue());
>   }
> }
> {code}
> The TaskSpec getTaskConf() need not include any of the default configs, since 
> the keys are placed into an existing task conf.
> {code}
> // Security framework already loaded the tokens into current ugi
> DAGProtos.ConfigurationProto confProto =
> 
> TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
> TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, 
> confProto.getConfKeyValuesList());
> UserGroupInformation.setConfiguration(defaultConf);
> Credentials credentials = 
> UserGroupInformation.getCurrentUser().getCredentials();
> {code}
> At the very least, the DAG and Vertex do not both need to have the same 
> configs repeated in them.
>  !tez-protobuf-writing.png! 
> +
>  !tez-am-protobuf-reading.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size

2019-05-30 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-4073:
-
Description: 
As the total number of vertices go up, the Tez protobuf transport starts to 
show up as a potential scalability problem for the task submission and the AM

{code}
public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, String[] 
localDirs,
 ...
this.taskConf = new Configuration(tezConf);
if (taskSpec.getTaskConf() != null) {
  Iterator> iter = taskSpec.getTaskConf().iterator();
  while (iter.hasNext()) {
Entry entry = iter.next();
taskConf.set(entry.getKey(), entry.getValue());
  }
}
{code}

The TaskSpec getTaskConf() need not include any of the default configs, since 
the keys are placed into an existing task conf.

{code}
// Security framework already loaded the tokens into current ugi
DAGProtos.ConfigurationProto confProto =

TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, 
confProto.getConfKeyValuesList());
UserGroupInformation.setConfiguration(defaultConf);
Credentials credentials = 
UserGroupInformation.getCurrentUser().getCredentials();
{code}

At the very least, the DAG and Vertex do not both need to have the same configs 
repeated in them.

 !tez-protobuf-writing.png! 
+
 !tez-am-protobuf-reading.png! 

  was:
As the total number of vertices go up, the Tez protobuf transport starts to 
show up as a potential scalability problem for the task submission and the AM

{code}
public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, String[] 
localDirs,
 ...
this.taskConf = new Configuration(tezConf);
if (taskSpec.getTaskConf() != null) {
  Iterator> iter = taskSpec.getTaskConf().iterator();
  while (iter.hasNext()) {
Entry entry = iter.next();
taskConf.set(entry.getKey(), entry.getValue());
  }
}
{code}

The TaskSpec getTaskConf() need not include any of the default configs, since 
the keys are placed into an existing task conf.

{code}
// Security framework already loaded the tokens into current ugi
DAGProtos.ConfigurationProto confProto =

TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, 
confProto.getConfKeyValuesList());
UserGroupInformation.setConfiguration(defaultConf);
Credentials credentials = 
UserGroupInformation.getCurrentUser().getCredentials();
{code}

At the very least, the DAG and Vertex do not both need to have the same configs 
repeated in them.


> Configuration: Reduce Vertex and DAG Payload Size
> -
>
> Key: TEZ-4073
> URL: https://issues.apache.org/jira/browse/TEZ-4073
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
> Attachments: tez-am-protobuf-reading.png, tez-protobuf-writing.png
>
>
> As the total number of vertices go up, the Tez protobuf transport starts to 
> show up as a potential scalability problem for the task submission and the AM
> {code}
> public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, 
> String[] localDirs,
>  ...
> this.taskConf = new Configuration(tezConf);
> if (taskSpec.getTaskConf() != null) {
>   Iterator> iter = 
> taskSpec.getTaskConf().iterator();
>   while (iter.hasNext()) {
> Entry entry = iter.next();
> taskConf.set(entry.getKey(), entry.getValue());
>   }
> }
> {code}
> The TaskSpec getTaskConf() need not include any of the default configs, since 
> the keys are placed into an existing task conf.
> {code}
> // Security framework already loaded the tokens into current ugi
> DAGProtos.ConfigurationProto confProto =
> 
> TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
> TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, 
> confProto.getConfKeyValuesList());
> UserGroupInformation.setConfiguration(defaultConf);
> Credentials credentials = 
> UserGroupInformation.getCurrentUser().getCredentials();
> {code}
> At the very least, the DAG and Vertex do not both need to have the same 
> configs repeated in them.
>  !tez-protobuf-writing.png! 
> +
>  !tez-am-protobuf-reading.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size

2019-05-30 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-4073:
-
Attachment: tez-am-protobuf-reading.png

> Configuration: Reduce Vertex and DAG Payload Size
> -
>
> Key: TEZ-4073
> URL: https://issues.apache.org/jira/browse/TEZ-4073
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
> Attachments: tez-am-protobuf-reading.png, tez-protobuf-writing.png
>
>
> As the total number of vertices go up, the Tez protobuf transport starts to 
> show up as a potential scalability problem for the task submission and the AM
> {code}
> public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, 
> String[] localDirs,
>  ...
> this.taskConf = new Configuration(tezConf);
> if (taskSpec.getTaskConf() != null) {
>   Iterator> iter = 
> taskSpec.getTaskConf().iterator();
>   while (iter.hasNext()) {
> Entry entry = iter.next();
> taskConf.set(entry.getKey(), entry.getValue());
>   }
> }
> {code}
> The TaskSpec getTaskConf() need not include any of the default configs, since 
> the keys are placed into an existing task conf.
> {code}
> // Security framework already loaded the tokens into current ugi
> DAGProtos.ConfigurationProto confProto =
> 
> TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
> TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, 
> confProto.getConfKeyValuesList());
> UserGroupInformation.setConfiguration(defaultConf);
> Credentials credentials = 
> UserGroupInformation.getCurrentUser().getCredentials();
> {code}
> At the very least, the DAG and Vertex do not both need to have the same 
> configs repeated in them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size

2019-05-30 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-4073:
-
Attachment: tez-protobuf-writing.png

> Configuration: Reduce Vertex and DAG Payload Size
> -
>
> Key: TEZ-4073
> URL: https://issues.apache.org/jira/browse/TEZ-4073
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
> Attachments: tez-protobuf-writing.png
>
>
> As the total number of vertices go up, the Tez protobuf transport starts to 
> show up as a potential scalability problem for the task submission and the AM
> {code}
> public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, 
> String[] localDirs,
>  ...
> this.taskConf = new Configuration(tezConf);
> if (taskSpec.getTaskConf() != null) {
>   Iterator> iter = 
> taskSpec.getTaskConf().iterator();
>   while (iter.hasNext()) {
> Entry entry = iter.next();
> taskConf.set(entry.getKey(), entry.getValue());
>   }
> }
> {code}
> The TaskSpec getTaskConf() need not include any of the default configs, since 
> the keys are placed into an existing task conf.
> {code}
> // Security framework already loaded the tokens into current ugi
> DAGProtos.ConfigurationProto confProto =
> 
> TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
> TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, 
> confProto.getConfKeyValuesList());
> UserGroupInformation.setConfiguration(defaultConf);
> Credentials credentials = 
> UserGroupInformation.getCurrentUser().getCredentials();
> {code}
> At the very least, the DAG and Vertex do not both need to have the same 
> configs repeated in them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size

2019-05-30 Thread Gopal V (JIRA)
Gopal V created TEZ-4073:


 Summary: Configuration: Reduce Vertex and DAG Payload Size
 Key: TEZ-4073
 URL: https://issues.apache.org/jira/browse/TEZ-4073
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V


As the total number of vertices go up, the Tez protobuf transport starts to 
show up as a potential scalability problem for the task submission and the AM

{code}
public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, String[] 
localDirs,
 ...
this.taskConf = new Configuration(tezConf);
if (taskSpec.getTaskConf() != null) {
  Iterator> iter = taskSpec.getTaskConf().iterator();
  while (iter.hasNext()) {
Entry entry = iter.next();
taskConf.set(entry.getKey(), entry.getValue());
  }
}
{code}

The TaskSpec getTaskConf() need not include any of the default configs, since 
the keys are placed into an existing task conf.

{code}
// Security framework already loaded the tokens into current ugi
DAGProtos.ConfigurationProto confProto =

TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name()));
TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, 
confProto.getConfKeyValuesList());
UserGroupInformation.setConfiguration(defaultConf);
Credentials credentials = 
UserGroupInformation.getCurrentUser().getCredentials();
{code}

At the very least, the DAG and Vertex do not both need to have the same configs 
repeated in them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)