[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-05 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551830662



##
File path: python/tvm/auto_scheduler/cost_model/xgb_model.py
##
@@ -116,12 +117,17 @@ def __init__(self, verbose_eval=25, 
num_warmup_sample=100, seed=None):
 self.plan_size = 32
 self.num_warmup_sample = num_warmup_sample
 self.verbose_eval = verbose_eval
+self.model_file = model_file
+if model_file:
+logger.info("XGBModel: Load pretrained model from %s...", 
model_file)
+self.load(model_file)

Review comment:
   Removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-05 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551830554



##
File path: python/tvm/auto_scheduler/cost_model/xgb_model.py
##
@@ -141,6 +147,12 @@ def update(self, inputs, results):
 self.inputs.extend(inputs)
 self.results.extend(results)
 
+if len(self.inputs) - self.last_train_length < self.last_train_length 
/ 5:

Review comment:
   Added.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-05 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551816877



##
File path: python/tvm/auto_scheduler/task_scheduler.py
##
@@ -82,11 +82,12 @@ def make_search_policies(
 if isinstance(search_policy, str):
 policy_type, model_type = search_policy.split(".")
 if model_type == "xgb":
-cost_model = XGBModel(num_warmup_sample=len(tasks) * 
num_measures_per_round)
-if load_model_file:
-logger.info("TaskScheduler: Load pretrained model...")
-cost_model.load(load_model_file)
-elif load_log_file:
+cost_model = XGBModel(
+num_warmup_sample=len(tasks) * num_measures_per_round,
+model_file=load_model_file,
+)
+if load_log_file:
+logger.info("TaskScheduler: Reload measured states and train 
the model...")

Review comment:
   The old one is fine. I was just going to add a `self.model_file` for 
cost model saving after training, this was modified by the way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-05 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551816877



##
File path: python/tvm/auto_scheduler/task_scheduler.py
##
@@ -82,11 +82,12 @@ def make_search_policies(
 if isinstance(search_policy, str):
 policy_type, model_type = search_policy.split(".")
 if model_type == "xgb":
-cost_model = XGBModel(num_warmup_sample=len(tasks) * 
num_measures_per_round)
-if load_model_file:
-logger.info("TaskScheduler: Load pretrained model...")
-cost_model.load(load_model_file)
-elif load_log_file:
+cost_model = XGBModel(
+num_warmup_sample=len(tasks) * num_measures_per_round,
+model_file=load_model_file,
+)
+if load_log_file:
+logger.info("TaskScheduler: Reload measured states and train 
the model...")

Review comment:
   Oh, the old one is fine. I was just going to add a `self.model_file` 
for cost model saving after training, this was modified by the way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551675372



##
File path: src/runtime/graph/debug/graph_runtime_debug.cc
##
@@ -153,9 +153,10 @@ class GraphRuntimeDebug : public GraphRuntime {
 const TVMContext& ctx = data_entry_[entry_id(index, 0)]->ctx;
 TVMSynchronize(ctx.device_type, ctx.device_id, nullptr);
 auto op_tend = std::chrono::high_resolution_clock::now();
-double op_duration =
-std::chrono::duration_cast >(op_tend - 
op_tbegin).count();
-return op_duration;
+double op_duration_us =
+std::chrono::duration_cast >(op_tend - 
op_tbegin).count() *

Review comment:
   Thanks! This does not block my PR, and I would remove this part from 
here.
   I've opened a new issue #7207 to track this bug.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551211822



##
File path: python/tvm/auto_scheduler/cost_model/xgb_model.py
##
@@ -141,6 +146,12 @@ def update(self, inputs, results):
 self.inputs.extend(inputs)
 self.results.extend(results)
 
+if len(self.inputs) - self.last_train_length < self.last_train_length 
/ 5:

Review comment:
   When there're too many logs, auto scheduler will spend much time on cost 
model training than program measuring in each search round.
   This modification is just used to reduce the times of cost model training. 
At the beginning, we train the cost model in each search round, and when many 
logs are accumulated, we train the cost model after several search rounds.
   
   `len(self.inputs) - self.last_train_length` is the increased measure pairs 
since last training, we should not just use `len(inputs)`. The `inputs` and 
`results` will be extened to `self.inputs` and `self.results`.
   And the number 5 is just a magic number I pick, I'm not sure how to choose a 
better threshold.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551211822



##
File path: python/tvm/auto_scheduler/cost_model/xgb_model.py
##
@@ -141,6 +146,12 @@ def update(self, inputs, results):
 self.inputs.extend(inputs)
 self.results.extend(results)
 
+if len(self.inputs) - self.last_train_length < self.last_train_length 
/ 5:

Review comment:
   When there're too many logs, auto scheduler will spend much time on cost 
model training than program measuring in each search round.
   This modification is just used to reduce the times of cost model training. 
At the beginning, we train the cost model in each search round, and when many 
logs are accumulated, we train the cost model after several search rounds.
   
   `len(self.inputs) - self.last_train_length` is the increased measure pairs 
since last training, we should not just use `len(inputs)`.
   And the number 5 is just a magic number I pick, I'm not sure how to choose a 
better threshold.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551211822



##
File path: python/tvm/auto_scheduler/cost_model/xgb_model.py
##
@@ -141,6 +146,12 @@ def update(self, inputs, results):
 self.inputs.extend(inputs)
 self.results.extend(results)
 
+if len(self.inputs) - self.last_train_length < self.last_train_length 
/ 5:

Review comment:
   When there're too many logs, auto scheduler will spent much time on cost 
model training than program measuring in each search round.
   This modification is just used to reduce the times of cost model training. 
At the beginning, we train the cost model in each search round, and when many 
logs are accumulated, we train the cost model after several search rounds.
   
   `len(self.inputs) - self.last_train_length` is the increased measure pairs 
since last training, we should not just use `len(inputs)`.
   And the number 5 is just a magic number I pick, I'm not sure how to choose a 
better threshold.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551211822



##
File path: python/tvm/auto_scheduler/cost_model/xgb_model.py
##
@@ -141,6 +146,12 @@ def update(self, inputs, results):
 self.inputs.extend(inputs)
 self.results.extend(results)
 
+if len(self.inputs) - self.last_train_length < self.last_train_length 
/ 5:

Review comment:
   When there're too many logs, auto scheduler will spent much time on cost 
model training than program measuring in each search round.
   This modification is just used to reduce the times of cost model training. 
At the beginning, we train the cost model in each search round, and when many 
logs are accumulated, we train the cost model after several search rounds.
   
   `len(self.inputs) - self.last_train_length` is the increased measure pairs 
since last training, we should not just use `len(inputs)`.
   And the number 5 is just a magic number I pick, I'm not sure how to choose 
the threshold better.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551211822



##
File path: python/tvm/auto_scheduler/cost_model/xgb_model.py
##
@@ -141,6 +146,12 @@ def update(self, inputs, results):
 self.inputs.extend(inputs)
 self.results.extend(results)
 
+if len(self.inputs) - self.last_train_length < self.last_train_length 
/ 5:

Review comment:
   When there're too many logs, auto scheduler will spent much time on cost 
model training than program measuring in each search round.
   This modification is just used to reduce the times of cost model training. 
At the beginning, we train the cost model in each search round, and when many 
logs are accumulated, we train the cost model after several search rounds.
   
   `len(self.inputs) - self.last_train_length` is the increased measure pairs 
since last training, we should not just use len(inputs).
   And the number 5 is just a magic number I pick, I'm not sure how to choose 
the threshold better.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551211822



##
File path: python/tvm/auto_scheduler/cost_model/xgb_model.py
##
@@ -141,6 +146,12 @@ def update(self, inputs, results):
 self.inputs.extend(inputs)
 self.results.extend(results)
 
+if len(self.inputs) - self.last_train_length < self.last_train_length 
/ 5:

Review comment:
   When there're too many logs, auto scheduler will spent much time on cost 
model training than program measuring in each search round.
   This modification is just used to reduce the times of cost model training. 
At the beginning, we train the cost model in each search round, and when many 
logs are accumulated, we train the cost model after several search rounds.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551206249



##
File path: src/auto_scheduler/feature.cc
##
@@ -1462,12 +1462,18 @@ void GetPerStoreFeaturesFromMeasurePairs(const 
Array& inputs,
 if (find_res == task_cache.end()) {
   if (inputs[i]->task->compute_dag.defined()) {  // the measure input is 
complete
 task = inputs[i]->task;
-  } else {  // the measure input is incomplete
-// rebuild task for incomplete measure pairs read from file
-Array tensors = (*workload_key_to_tensors)(workload_key);
-task = SearchTask(ComputeDAG(tensors), workload_key, 
inputs[i]->task->target,
-  inputs[i]->task->target_host, 
inputs[i]->task->hardware_params,
-  inputs[i]->task->layout_rewrite_option);
+  } else {
+// The measure input is incomplete, rebuild task for incomplete 
measure pairs read from file
+try {
+  Array tensors = (*workload_key_to_tensors)(workload_key);
+  task = SearchTask(ComputeDAG(tensors), workload_key, 
inputs[i]->task->target,
+inputs[i]->task->target_host, 
inputs[i]->task->hardware_params,
+inputs[i]->task->layout_rewrite_option);
+} catch (std::exception& e) {
+  // Cannot build ComputeDAG from workload key, the task may have not 
been registered in
+  // this search round
+  continue;

Review comment:
   Emm ... I think this should be fine here.
   For example, I have a `log.json` which contains 100 tasks, and I would like 
to only tune the last 10 tasks next time(in another `python _tune.py` 
call). Without this `try ... catch`, we'll get error in reading the first 90 
tasks because they have not been registered.
   And this modification just makes it works by skip loading the log of the 
first 90 tasks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] jcf94 commented on a change in pull request #7197: [Fix][Autoscheduler] Costmodel enhancement & bug fix for graph debug runtime

2021-01-04 Thread GitBox


jcf94 commented on a change in pull request #7197:
URL: https://github.com/apache/tvm/pull/7197#discussion_r551206249



##
File path: src/auto_scheduler/feature.cc
##
@@ -1462,12 +1462,18 @@ void GetPerStoreFeaturesFromMeasurePairs(const 
Array& inputs,
 if (find_res == task_cache.end()) {
   if (inputs[i]->task->compute_dag.defined()) {  // the measure input is 
complete
 task = inputs[i]->task;
-  } else {  // the measure input is incomplete
-// rebuild task for incomplete measure pairs read from file
-Array tensors = (*workload_key_to_tensors)(workload_key);
-task = SearchTask(ComputeDAG(tensors), workload_key, 
inputs[i]->task->target,
-  inputs[i]->task->target_host, 
inputs[i]->task->hardware_params,
-  inputs[i]->task->layout_rewrite_option);
+  } else {
+// The measure input is incomplete, rebuild task for incomplete 
measure pairs read from file
+try {
+  Array tensors = (*workload_key_to_tensors)(workload_key);
+  task = SearchTask(ComputeDAG(tensors), workload_key, 
inputs[i]->task->target,
+inputs[i]->task->target_host, 
inputs[i]->task->hardware_params,
+inputs[i]->task->layout_rewrite_option);
+} catch (std::exception& e) {
+  // Cannot build ComputeDAG from workload key, the task may have not 
been registered in
+  // this search round
+  continue;

Review comment:
   Emm ... I think this should be fine here.
   For example, I have a `log.json` which contains 100 tasks, and I would like 
to only tune the last 10 tasks next time. Without this `try ... catch`, we'll 
get error in reading the first 90 tasks because they have not been registered.
   And this modification just makes it works without loading the log of the 
first 90 tasks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org