Hiramatsu-san,

thank you for reviewing my patch.

Thus wrote Masami Hiramatsu ([email protected]):

> Ah, this is a bit complicated. It seems to work with sched_switch event
> as commit f04dec93466a ("tracing/eprobes: Fix reading of string fields"):

> echo 'e:sw sched/sched_switch comm=$next_comm:string' > dynamic_events

> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>               sh-162     [002] d..3.    54.027213: sw: (sched.sched_switch) 
> comm="swapper/2"
>           <idle>-0       [007] d..3.    54.034573: sw: (sched.sched_switch) 
> comm="rcu_preempt"
>      rcu_preempt-15      [007] d..3.    54.034589: sw: (sched.sched_switch) 
> comm="swapper/7"

> Maybe comm is stored as a fixed string information in the event record?

Yes, this example does not execute my change.

> /sys/kernel/tracing # cat events/sched/sched_switch/format 
> name: sched_switch
> ID: 254
> format:
>       field:unsigned short common_type;       offset:0;       size:2; 
> signed:0;
>       field:unsigned char common_flags;       offset:2;       size:1; 
> signed:0;
>       field:unsigned char common_preempt_count;       offset:3;       size:1; 
> signed:0;
>       field:int common_pid;   offset:4;       size:4; signed:1;

>       field:char prev_comm[16];       offset:8;       size:16;        
> signed:0;
>       field:pid_t prev_pid;   offset:24;      size:4; signed:1;
>       field:int prev_prio;    offset:28;      size:4; signed:1;
>       field:long prev_state;  offset:32;      size:8; signed:1;
>       field:char next_comm[16];       offset:40;      size:16;        
> signed:0;
>       field:pid_t next_pid;   offset:56;      size:4; signed:1;
>       field:int next_prio;    offset:60;      size:4; signed:1;

> But the filename is a pointer.

> /sys/kernel/tracing # cat events/syscalls/sys_enter_openat/format 
> name: sys_enter_openat
> ID: 705
> format:
>       field:unsigned short common_type;       offset:0;       size:2; 
> signed:0;
>       field:unsigned char common_flags;       offset:2;       size:1; 
> signed:0;
>       field:unsigned char common_preempt_count;       offset:3;       size:1; 
> signed:0;
>       field:int common_pid;   offset:4;       size:4; signed:1;

>       field:int __syscall_nr; offset:8;       size:4; signed:1;
>       field:int dfd;  offset:16;      size:8; signed:0;
>       field:const char * filename;    offset:24;      size:8; signed:0;
>       field:int flags;        offset:32;      size:8; signed:0;
>       field:umode_t mode;     offset:40;      size:8; signed:0;
>       field:__data_loc char[] __filename_val; offset:48;      size:4; 
> signed:0;

> In this case, the filename field should use __data_loc directly instead of
> pointing data on the ring buffer.

> Can you try 

> echo 'e syscalls.sys_enter_openat $__filename_val:string' > \
>               /sys/kernel/tracing/dynamic_events

> Instead?

This field is working as expected.

I still believe that the handling of FILTER_PTR_STRING is not correct. The
pointer is stored in the ringbuffer as unsigned long and read as a char. This
gives us a truncated pointer that cannot be dereferenced.

> I think better solution is fixing sycall tracer.

I would say that syscall trace is doing the right thing. The ringbuffer entry
is a struct syscall_trace_enter, the syscall arguments are unsigned longs.
They are written in ftrace_syscall_enter, this looks correct to me.

A const char * syscall argument is using FILTER_PTR_STRING, the unsigned long
argument from the ringbuffer is read as a char and then converted to a
truncated pointer.

Thanks,
Martin

Reply via email to