[Crash-utility] Re: [PATCH] Fix the "ps -m" command shows wrong duration of RU task

Tao Liu Wed, 14 May 2025 16:45:16 -0700

Hi Ke Yin,

On Wed, May 14, 2025 at 8:58 PM Ke Yin <k...@redhat.com> wrote:
>
> Hi Tao Liu & Kazu,
>
> Thanks for replying and sharing your thoughts.
>
> After a quick review of crash tool code, I found:
>
> runq -m will call dump_on_rq_milliseconds() to print the amount
> of time that the active task on each cpu has been running,
> but only for the current running task.
>
> runq -d will call dump_on_rq_tasks() to print all tasks in the run queue
> and the task running on cpu without calling translate_nanoseconds().
>
> My preliminary idea is to combine these two functions and add a new
> parameter, for example -q, to print the tasks on each cpu that has
> been waiting in the run queue only. And as well as update doc of runq.
>
> In short:
> runq -q will call new_function which is the modified function based on 
> dump_on_rq_tasks() (skip current + translate_nanoseconds).
>
> What do you think?


I'm OK with your idea in general. Please check if I understood
correctly, your implementation is like:
cmd_runq() {
...
  if (-d option) {
    dump_on_rq_tasks(old path);
  } else if (-q option) {
    dump_on_rq_tasks(new path);
  }
}

dump_on_rq_tasks(option)
{
  ...
  for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
    if (old path) // Old path stay unchanged
      dump_task_runq_entry(tc, 0);
    else // New path will output your time duration
      your_new_function_with_translate_nanoseconds();
  }
}

Thanks,
Tao Liu

>
> Thanks
> Kenneth Yin
>
>
>
>
> On Mon, May 12, 2025 at 1:36 PM Tao Liu <l...@redhat.com> wrote:
>>
>> Hi Kazu & Kenneth,
>>
>> Sorry for the late reply, and thanks for your fix and comments!
>>
>> On Thu, May 8, 2025 at 12:20 PM HAGIO KAZUHITO(萩尾　一仁)
>> <k-hagio...@nec.com> wrote:
>> >
>> > On 2025/05/07 16:16, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> > > Hi,
>> > >
>> > > On 2025/04/28 19:38, Kenneth Yin wrote:
>> > >> The RU/TASK_RUNNING stat means the task is runnable.
>> > >> It is either currently running or on a run queue waiting to run.
>> > >>
>> > >> Currently, the crash tool uses the "rq_clock - 
>> > >> sched_info->last_arrival" formula to
>> > >> calculate the duration of task in RU state. This is for the scenario of 
>> > >> a task running on a CPU.
>> > >
>> > > The "ps -l" and "ps -m" options display what their help text describes,
>> > > not the duration of task in RU state.  Please see "help ps".
>> > >
>> > > Also, tasks are sorted by the value, using different values for it could
>> > > make another confusion.
>> > >
>> > > The options have been used for a long time with the current code, if we
>> > > change the semantics of the options, it would be better to be careful.
>> > > The change might lose a kind of information instead of getting another
>> > > kind of information.
>> > >
>> > > On the other hand, I think that the duration of waiting in queue might
>> > > also be useful information.  I'm not sure how we should display them,
>> > > but for example, how about adding a new option or adding a column for
>> > > last_queued?
>> >
>> > I thought of that the "runq" command might be suitable to display the
>> > waiting duration, because only tasks in the run queues have it.  For
>> > example, extending the "runq -m" option or adding a new option.  just my
>> > thought.
>> >
>> > Thanks,
>> > Kazu
>> >
>> > >
>> > > What do you think, folks?
>> > >
>> > > Thanks,
>> > > Kazu
>> > >
>> > >>
>> > >> But for the scenario of a task waiting in the  CPU run queue (due to 
>> > >> some reason
>> > >> for example cfs/rt queue throttled), this formula could cause 
>> > >> misunderstanding.
>> > >>
>> > >> For example:
>> > >> [ 220 10:36:38.026] [RU]  PID: 12345   TASK: ffff8d674ab6b180  CPU: 1   
>> > >>      COMMAND: "task"
>> > >>
>> > >> Looking closer:
>> > >>
>> > >> crash> rq.clock ffff8de438a5acc0
>> > >>     clock = 87029229985307234,
>> > >>
>> > >> crash> task -R sched_info,se.exec_start
>> > >> PID: 12345   TASK: ffff8d674ab6b180  CPU: 1  COMMAND: "task"
>> > >>     sched_info = {
>> > >>      pcount = 33,
>> > >>      run_delay = 0,
>> > >>      last_arrival = 67983031958439673,
>> > >>      last_queued = 87029224561119369
>> > >>     },
>> > >>     se.exec_start = 67983031958476937,
>> > >>
>> > >> 67983031         67983031                 87029224              87029229
>> > >> |<-   running on CPU  ->| <-      IN    ->|<-    waiting in queue    ->|
>> > >>
>> > >> For this scenario, the "task" was waiting in the run queue of the CPU 
>> > >> only for 5 seconds,
>> > >> we should use the "rq_clock - sched_info->last_queued" formula.
>>
>> Please check if my understanding is correct:
>>
>> The result you saw is "rq_clock - sched_info->last_arrival == 87029229
>> - 67983031 == 19046198"
>> The expected result you want is: "rq_clock - sched_info->last_queued
>> == 87029229 - 87029224 == 5"
>>
>> You think the 19046198 value is misleading and should be 5 which only
>> contains the waiting in queue duration, am I correct?
>>
>> I agree with Kazu's idea, that we shouldn't change the existing ps
>> cmd's behaviour, and runq is a better alternative for the
>> waiting-in-queue duration display.
>>
>> What do you think? Could you please improve your code as well as an
>> updated "help runq" doc for runq?
>>
>> Thanks,
>> Tao Liu
>>
>> > >>
>> > >> We can trust sched_info->last_queued as it is only set when the task 
>> > >> enters the CPU run queue.
>> > >> Furthermore, when the task hits/runs on a CPU or dequeues the CPU run 
>> > >> queue, it will be reset to 0.
>> > >>
>> > >> Therefore, my idea is simple:
>> > >>
>> > >> If a task in RU stat and sched_info->last_queued has value (!= 0),
>> > >> it means this task is waiting in the run queue, use "rq_clock - 
>> > >> sched_info->last_queued".
>> > >>
>> > >> Otherwise, if a task in RU stat and sched_info->last_queued = 0
>> > >> and sched_info->last_arrival has value (it must be), it means this task 
>> > >> is running on the CPU,
>> > >> use "rq_clock - sched_info->last_arrival".
>> > >>
>> > >> Signed-off-by: Kenneth Yin <k...@redhat.com>
>> > >> ---
>> > >>    defs.h    |  1 +
>> > >>    symbols.c |  2 ++
>> > >>    task.c    | 21 +++++++++++++++------
>> > >>    3 files changed, 18 insertions(+), 6 deletions(-)
>> > >>
>> > >> diff --git a/defs.h b/defs.h
>> > >> index 4cf169c..66f5ce4 100644
>> > >> --- a/defs.h
>> > >> +++ b/defs.h
>> > >> @@ -1787,6 +1787,7 @@ struct offset_table {                    /* stash 
>> > >> of commonly-used offsets */
>> > >>      long vcpu_struct_rq;
>> > >>      long task_struct_sched_info;
>> > >>      long sched_info_last_arrival;
>> > >> +    long sched_info_last_queued;
>> > >>      long page_objects;
>> > >>      long kmem_cache_oo;
>> > >>      long char_device_struct_cdev;
>> > >> diff --git a/symbols.c b/symbols.c
>> > >> index e30fafe..fb5035f 100644
>> > >> --- a/symbols.c
>> > >> +++ b/symbols.c
>> > >> @@ -9930,6 +9930,8 @@ dump_offset_table(char *spec, ulong makestruct)
>> > >>                    OFFSET(sched_rt_entity_run_list));
>> > >>      fprintf(fp, "       sched_info_last_arrival: %ld\n",
>> > >>                    OFFSET(sched_info_last_arrival));
>> > >> +    fprintf(fp, "       sched_info_last_queued: %ld\n",
>> > >> +            OFFSET(sched_info_last_queued));
>> > >>            fprintf(fp, "       task_struct_thread_info: %ld\n",
>> > >>                    OFFSET(task_struct_thread_info));
>> > >>            fprintf(fp, "             task_struct_stack: %ld\n",
>> > >> diff --git a/task.c b/task.c
>> > >> index 3bafe79..f5386ac 100644
>> > >> --- a/task.c
>> > >> +++ b/task.c
>> > >> @@ -332,9 +332,12 @@ task_init(void)
>> > >>            MEMBER_OFFSET_INIT(task_struct_last_run, "task_struct", 
>> > >> "last_run");
>> > >>            MEMBER_OFFSET_INIT(task_struct_timestamp, "task_struct", 
>> > >> "timestamp");
>> > >>            MEMBER_OFFSET_INIT(task_struct_sched_info, "task_struct", 
>> > >> "sched_info");
>> > >> -    if (VALID_MEMBER(task_struct_sched_info))
>> > >> +    if (VALID_MEMBER(task_struct_sched_info)) {
>> > >>              MEMBER_OFFSET_INIT(sched_info_last_arrival,
>> > >>                      "sched_info", "last_arrival");
>> > >> +            MEMBER_OFFSET_INIT(sched_info_last_queued,
>> > >> +                    "sched_info", "last_queued");
>> > >> +    }
>> > >>      if (VALID_MEMBER(task_struct_last_run) ||
>> > >>          VALID_MEMBER(task_struct_timestamp) ||
>> > >>          VALID_MEMBER(sched_info_last_arrival)) {
>> > >> @@ -6035,7 +6038,7 @@ ulonglong
>> > >>    task_last_run(ulong task)
>> > >>    {
>> > >>            ulong last_run;
>> > >> -    ulonglong timestamp;
>> > >> +    ulonglong timestamp,last_queued;
>> > >>
>> > >>      timestamp = 0;
>> > >>            fill_task_struct(task);
>> > >> @@ -6047,10 +6050,16 @@ task_last_run(ulong task)
>> > >>      } else if (VALID_MEMBER(task_struct_timestamp))
>> > >>              timestamp = tt->last_task_read ?  
>> > >> ULONGLONG(tt->task_struct +
>> > >>                      OFFSET(task_struct_timestamp)) : 0;
>> > >> -    else if (VALID_MEMBER(sched_info_last_arrival))
>> > >> -            timestamp = tt->last_task_read ?  
>> > >> ULONGLONG(tt->task_struct +
>> > >> -                    OFFSET(task_struct_sched_info) +
>> > >> -                    OFFSET(sched_info_last_arrival)) : 0;
>> > >> +    else if (VALID_MEMBER(sched_info_last_queued))
>> > >> +            last_queued = ULONGLONG(tt->task_struct +
>> > >> +                    OFFSET(task_struct_sched_info) +
>> > >> +                    OFFSET(sched_info_last_queued));
>> > >> +            if (last_queued != 0) {
>> > >> +                    timestamp = tt->last_task_read ? last_queued : 0;
>> > >> +            } else if (VALID_MEMBER(sched_info_last_arrival))
>> > >> +                            timestamp = tt->last_task_read ?  
>> > >> ULONGLONG(tt->task_struct +
>> > >> +                            OFFSET(task_struct_sched_info) +
>> > >> +                            OFFSET(sched_info_last_arrival)) : 0;
>> > >>
>> > >>            return timestamp;
>> > >>    }
>> > > --
>> > > Crash-utility mailing list -- devel@lists.crash-utility.osci.io
>> > > To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
>> > > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
>> > > Contribution Guidelines: https://github.com/crash-utility/crash/wiki
>> > --
>> > Crash-utility mailing list -- devel@lists.crash-utility.osci.io
>> > To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
>> > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
>> > Contribution Guidelines: https://github.com/crash-utility/crash/wiki
>>
>
>
> --
> Kenneth Yin
> Senior Software Maintenance Engineer
> Customer Experience and Engagement
> Phone: +86-10-6533-9459
> Red Hat China
--
Crash-utility mailing list -- devel@lists.crash-utility.osci.io
To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

[Crash-utility] Re: [PATCH] Fix the "ps -m" command shows wrong duration of RU task

Reply via email to