mehrdadh commented on code in PR #12539:
URL: https://github.com/apache/tvm/pull/12539#discussion_r951656042


##########
python/tvm/micro/testing/evaluation.py:
##########
@@ -154,6 +154,6 @@ def evaluate_model_accuracy(session, aot_executor, 
input_data, true_labels, runs
         aot_runtimes.append(runtime)
 
     num_correct = sum(u == v for u, v in zip(true_labels, predicted_labels))
-    average_time = sum(aot_runtimes) / len(aot_runtimes)
+    average_time = np.median(aot_runtimes)

Review Comment:
   as a helper function I don't think we want to make the decision here on how 
to use the data, specially where there are some anomaly in the data. I suggest 
to fix this issue we report the list of runtime and let users to handle this 
based on their case.
   For example I could see someone would sort the data and eliminate top 10% 
and 90% and then use the average
   wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to