[jira] [Issue Comment Deleted] (TEZ-4067) Tez Speculation decision is calculated on each update by the dispatcher
[ https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated TEZ-4067: --- Comment: was deleted (was: TEZ-1897) > Tez Speculation decision is calculated on each update by the dispatcher > --- > > Key: TEZ-4067 > URL: https://issues.apache.org/jira/browse/TEZ-4067 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch > > > LegacySpeculator is an object field in VertexImpl. Therefore, all events are > handled synchronously by the caller (dispatcher). This implies the following: > # the dispatcher spends long time executing updateStatus as it needs to > check the runtime estimation of the tezAttempts within the vertex. > # the speculator is per stage: lunching a speculation may not the optimum > decision. Ideally, based on resources, speculated tasks should be the ones > with slowest progress. > # the time between speculation is skewed because there is a big delay for > the dispatcher to complete a full cycle. Also, speculation will be more > aggressive compared to MR because MR waits for > "soonest.retry.after.speculate" whenever a task is speculated. On the other > hand, Tez speculates more tasks as it processes stages in parallel. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-4067) Tez Speculation decision is calculated on each update by the dispatcher
[ https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852045#comment-16852045 ] Ahmed Hussein commented on TEZ-4067: TEZ-1897 > Tez Speculation decision is calculated on each update by the dispatcher > --- > > Key: TEZ-4067 > URL: https://issues.apache.org/jira/browse/TEZ-4067 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch > > > LegacySpeculator is an object field in VertexImpl. Therefore, all events are > handled synchronously by the caller (dispatcher). This implies the following: > # the dispatcher spends long time executing updateStatus as it needs to > check the runtime estimation of the tezAttempts within the vertex. > # the speculator is per stage: lunching a speculation may not the optimum > decision. Ideally, based on resources, speculated tasks should be the ones > with slowest progress. > # the time between speculation is skewed because there is a big delay for > the dispatcher to complete a full cycle. Also, speculation will be more > aggressive compared to MR because MR waits for > "soonest.retry.after.speculate" whenever a task is speculated. On the other > hand, Tez speculates more tasks as it processes stages in parallel. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size
[ https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851545#comment-16851545 ] Gopal V edited comment on TEZ-4073 at 5/30/19 6:07 AM: --- Async dispatcher CPU is mostly spent on the protobuf codepaths. The AM side shows hotspots in places like {code} public VertexManagerPluginDescriptor build() { VertexManagerPluginDescriptor desc = VertexManagerPluginDescriptor.create( RootInputVertexManager.class.getName()); try { return desc.setUserPayload(TezUtils .createUserPayloadFromConf(this.conf)); } catch (IOException e) { throw new TezUncheckedException(e); } } {code} was (Author: gopalv): Async dispatcher CPU is mostly spent on the protobuf codepaths. > Configuration: Reduce Vertex and DAG Payload Size > - > > Key: TEZ-4073 > URL: https://issues.apache.org/jira/browse/TEZ-4073 > Project: Apache Tez > Issue Type: Bug >Reporter: Gopal V >Priority: Major > Attachments: tez-am-protobuf-reading.png, tez-protobuf-writing.png > > > As the total number of vertices go up, the Tez protobuf transport starts to > show up as a potential scalability problem for the task submission and the AM > {code} > public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, > String[] localDirs, > ... > this.taskConf = new Configuration(tezConf); > if (taskSpec.getTaskConf() != null) { > Iterator> iter = > taskSpec.getTaskConf().iterator(); > while (iter.hasNext()) { > Entry entry = iter.next(); > taskConf.set(entry.getKey(), entry.getValue()); > } > } > {code} > The TaskSpec getTaskConf() need not include any of the default configs, since > the keys are placed into an existing task conf. > {code} > // Security framework already loaded the tokens into current ugi > DAGProtos.ConfigurationProto confProto = > > TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); > TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, > confProto.getConfKeyValuesList()); > UserGroupInformation.setConfiguration(defaultConf); > Credentials credentials = > UserGroupInformation.getCurrentUser().getCredentials(); > {code} > At the very least, the DAG and Vertex do not both need to have the same > configs repeated in them. > !tez-protobuf-writing.png! > + > !tez-am-protobuf-reading.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size
[ https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851545#comment-16851545 ] Gopal V commented on TEZ-4073: -- Async dispatcher CPU is mostly spent on the protobuf codepaths. > Configuration: Reduce Vertex and DAG Payload Size > - > > Key: TEZ-4073 > URL: https://issues.apache.org/jira/browse/TEZ-4073 > Project: Apache Tez > Issue Type: Bug >Reporter: Gopal V >Priority: Major > Attachments: tez-am-protobuf-reading.png, tez-protobuf-writing.png > > > As the total number of vertices go up, the Tez protobuf transport starts to > show up as a potential scalability problem for the task submission and the AM > {code} > public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, > String[] localDirs, > ... > this.taskConf = new Configuration(tezConf); > if (taskSpec.getTaskConf() != null) { > Iterator> iter = > taskSpec.getTaskConf().iterator(); > while (iter.hasNext()) { > Entry entry = iter.next(); > taskConf.set(entry.getKey(), entry.getValue()); > } > } > {code} > The TaskSpec getTaskConf() need not include any of the default configs, since > the keys are placed into an existing task conf. > {code} > // Security framework already loaded the tokens into current ugi > DAGProtos.ConfigurationProto confProto = > > TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); > TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, > confProto.getConfKeyValuesList()); > UserGroupInformation.setConfiguration(defaultConf); > Credentials credentials = > UserGroupInformation.getCurrentUser().getCredentials(); > {code} > At the very least, the DAG and Vertex do not both need to have the same > configs repeated in them. > !tez-protobuf-writing.png! > + > !tez-am-protobuf-reading.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size
[ https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-4073: - Description: As the total number of vertices go up, the Tez protobuf transport starts to show up as a potential scalability problem for the task submission and the AM {code} public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, String[] localDirs, ... this.taskConf = new Configuration(tezConf); if (taskSpec.getTaskConf() != null) { Iterator> iter = taskSpec.getTaskConf().iterator(); while (iter.hasNext()) { Entry entry = iter.next(); taskConf.set(entry.getKey(), entry.getValue()); } } {code} The TaskSpec getTaskConf() need not include any of the default configs, since the keys are placed into an existing task conf. {code} // Security framework already loaded the tokens into current ugi DAGProtos.ConfigurationProto confProto = TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, confProto.getConfKeyValuesList()); UserGroupInformation.setConfiguration(defaultConf); Credentials credentials = UserGroupInformation.getCurrentUser().getCredentials(); {code} At the very least, the DAG and Vertex do not both need to have the same configs repeated in them. !tez-protobuf-writing.png! + !tez-am-protobuf-reading.png! was: As the total number of vertices go up, the Tez protobuf transport starts to show up as a potential scalability problem for the task submission and the AM {code} public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, String[] localDirs, ... this.taskConf = new Configuration(tezConf); if (taskSpec.getTaskConf() != null) { Iterator> iter = taskSpec.getTaskConf().iterator(); while (iter.hasNext()) { Entry entry = iter.next(); taskConf.set(entry.getKey(), entry.getValue()); } } {code} The TaskSpec getTaskConf() need not include any of the default configs, since the keys are placed into an existing task conf. {code} // Security framework already loaded the tokens into current ugi DAGProtos.ConfigurationProto confProto = TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, confProto.getConfKeyValuesList()); UserGroupInformation.setConfiguration(defaultConf); Credentials credentials = UserGroupInformation.getCurrentUser().getCredentials(); {code} At the very least, the DAG and Vertex do not both need to have the same configs repeated in them. > Configuration: Reduce Vertex and DAG Payload Size > - > > Key: TEZ-4073 > URL: https://issues.apache.org/jira/browse/TEZ-4073 > Project: Apache Tez > Issue Type: Bug >Reporter: Gopal V >Priority: Major > Attachments: tez-am-protobuf-reading.png, tez-protobuf-writing.png > > > As the total number of vertices go up, the Tez protobuf transport starts to > show up as a potential scalability problem for the task submission and the AM > {code} > public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, > String[] localDirs, > ... > this.taskConf = new Configuration(tezConf); > if (taskSpec.getTaskConf() != null) { > Iterator> iter = > taskSpec.getTaskConf().iterator(); > while (iter.hasNext()) { > Entry entry = iter.next(); > taskConf.set(entry.getKey(), entry.getValue()); > } > } > {code} > The TaskSpec getTaskConf() need not include any of the default configs, since > the keys are placed into an existing task conf. > {code} > // Security framework already loaded the tokens into current ugi > DAGProtos.ConfigurationProto confProto = > > TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); > TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, > confProto.getConfKeyValuesList()); > UserGroupInformation.setConfiguration(defaultConf); > Credentials credentials = > UserGroupInformation.getCurrentUser().getCredentials(); > {code} > At the very least, the DAG and Vertex do not both need to have the same > configs repeated in them. > !tez-protobuf-writing.png! > + > !tez-am-protobuf-reading.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size
[ https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-4073: - Attachment: tez-am-protobuf-reading.png > Configuration: Reduce Vertex and DAG Payload Size > - > > Key: TEZ-4073 > URL: https://issues.apache.org/jira/browse/TEZ-4073 > Project: Apache Tez > Issue Type: Bug >Reporter: Gopal V >Priority: Major > Attachments: tez-am-protobuf-reading.png, tez-protobuf-writing.png > > > As the total number of vertices go up, the Tez protobuf transport starts to > show up as a potential scalability problem for the task submission and the AM > {code} > public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, > String[] localDirs, > ... > this.taskConf = new Configuration(tezConf); > if (taskSpec.getTaskConf() != null) { > Iterator> iter = > taskSpec.getTaskConf().iterator(); > while (iter.hasNext()) { > Entry entry = iter.next(); > taskConf.set(entry.getKey(), entry.getValue()); > } > } > {code} > The TaskSpec getTaskConf() need not include any of the default configs, since > the keys are placed into an existing task conf. > {code} > // Security framework already loaded the tokens into current ugi > DAGProtos.ConfigurationProto confProto = > > TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); > TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, > confProto.getConfKeyValuesList()); > UserGroupInformation.setConfiguration(defaultConf); > Credentials credentials = > UserGroupInformation.getCurrentUser().getCredentials(); > {code} > At the very least, the DAG and Vertex do not both need to have the same > configs repeated in them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size
[ https://issues.apache.org/jira/browse/TEZ-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-4073: - Attachment: tez-protobuf-writing.png > Configuration: Reduce Vertex and DAG Payload Size > - > > Key: TEZ-4073 > URL: https://issues.apache.org/jira/browse/TEZ-4073 > Project: Apache Tez > Issue Type: Bug >Reporter: Gopal V >Priority: Major > Attachments: tez-protobuf-writing.png > > > As the total number of vertices go up, the Tez protobuf transport starts to > show up as a potential scalability problem for the task submission and the AM > {code} > public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, > String[] localDirs, > ... > this.taskConf = new Configuration(tezConf); > if (taskSpec.getTaskConf() != null) { > Iterator> iter = > taskSpec.getTaskConf().iterator(); > while (iter.hasNext()) { > Entry entry = iter.next(); > taskConf.set(entry.getKey(), entry.getValue()); > } > } > {code} > The TaskSpec getTaskConf() need not include any of the default configs, since > the keys are placed into an existing task conf. > {code} > // Security framework already loaded the tokens into current ugi > DAGProtos.ConfigurationProto confProto = > > TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); > TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, > confProto.getConfKeyValuesList()); > UserGroupInformation.setConfiguration(defaultConf); > Credentials credentials = > UserGroupInformation.getCurrentUser().getCredentials(); > {code} > At the very least, the DAG and Vertex do not both need to have the same > configs repeated in them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TEZ-4073) Configuration: Reduce Vertex and DAG Payload Size
Gopal V created TEZ-4073: Summary: Configuration: Reduce Vertex and DAG Payload Size Key: TEZ-4073 URL: https://issues.apache.org/jira/browse/TEZ-4073 Project: Apache Tez Issue Type: Bug Reporter: Gopal V As the total number of vertices go up, the Tez protobuf transport starts to show up as a potential scalability problem for the task submission and the AM {code} public TezTaskRunner2(Configuration tezConf, UserGroupInformation ugi, String[] localDirs, ... this.taskConf = new Configuration(tezConf); if (taskSpec.getTaskConf() != null) { Iterator> iter = taskSpec.getTaskConf().iterator(); while (iter.hasNext()) { Entry entry = iter.next(); taskConf.set(entry.getKey(), entry.getValue()); } } {code} The TaskSpec getTaskConf() need not include any of the default configs, since the keys are placed into an existing task conf. {code} // Security framework already loaded the tokens into current ugi DAGProtos.ConfigurationProto confProto = TezUtilsInternal.readUserSpecifiedTezConfiguration(System.getenv(Environment.PWD.name())); TezUtilsInternal.addUserSpecifiedTezConfiguration(defaultConf, confProto.getConfKeyValuesList()); UserGroupInformation.setConfiguration(defaultConf); Credentials credentials = UserGroupInformation.getCurrentUser().getCredentials(); {code} At the very least, the DAG and Vertex do not both need to have the same configs repeated in them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)