DickJC123 commented on a change in pull request #14200: Bulked op segments to allow Variable nodes URL: https://github.com/apache/incubator-mxnet/pull/14200#discussion_r261811596
########## File path: src/executor/graph_executor.cc ########## @@ -1211,63 +1212,53 @@ void GraphExecutor::InitOpSegs() { void GraphExecutor::BulkTrainingOpSegs(size_t total_num_nodes) { - // The maximum number of node in a segment executed in bulk - size_t num_nodes_threshold = dmlc::GetEnv("MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN", 15); + // The maximum number of nodes in a segment executed in bulk (excluding variables) in fwd pass. + size_t segment_num_nodes_threshold_fwd = Imperative::BulkExecMaxNodeTrainFwd(); + // The maximum number of nodes in a segment executed in bulk (excluding variables) in bwd pass. + size_t segment_num_nodes_threshold_bwd = Imperative::BulkExecMaxNodeTrainBwd(); // create forward segments for training size_t topo_start = 0; + size_t segment_node_count = 0; for (size_t nid = 0; nid < num_forward_nodes_; nid++) { Review comment: I think having a single function to bulk forward and backward is a good idea, such as: ``` void GraphExecutor::BulkTrainingOpSegsOverRange(size_t from_node, size_t up_to_node, size_t max_nodes_per_segment) { ... } ``` The only issue to work out is if there's a good reason why in forward we have: ``` bool ignore_node = node->is_variable(); ``` while in backward we have the more complicated: ``` bool ignore_node = node->is_variable() || op_node.skip_exec_node || op_node.exec == nullptr; ``` Do you know by any chance? The goal would be to unify these two as much as possible. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services