RE: [PATCH] sched: don't output cpu sched info by default

2016-04-26 Thread Zhang, Yanmin
I double-check it.
Function show_state_filter chooses to print all tasks when
state_filter == 0 (TASK_RUNNING).
sysrq_sched_debug_show prints per cpu sched debug info.

They are different info.

It is better to use sched_debug_enabled to control the
2nd part output.

Yanmin

-Original Message-
From: Zhang, Yanmin 
Sent: Tuesday, April 26, 2016 9:04 AM
To: Afzal Mohammed; Zhang, LongX
Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] sched: don't output cpu sched info by default

Thanks for the pointer. We didn't notice that. That patch also fixes the issue 
indeed.

Yanmin

-Original Message-
From: Afzal Mohammed [mailto:afzal.mohd...@gmail.com]
Sent: Monday, April 25, 2016 6:50 PM
To: Zhang, LongX
Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org; 
Zhang, Yanmin
Subject: Re: [PATCH] sched: don't output cpu sched info by default

Hi,

On Mon, Apr 25, 2016 at 01:04:41PM +0800, Zhang Long wrote:

> Android userspace debug prcoess might dump system info by sysrq.
> One info is of cpu sched. Usually, one thread has one line dump.
> Such log is huge sometimes and such dumping spends lots of time and 
> make the system worse. Sometimes, watchdog resets the system in the 
> end.
> 
> The patch fixes it by dumping cpu sched info only when 
> sched_debug_enabled is true.

Doesn't "sched: don't dump sched debug info in SysRq-W" currently in 
tip/sched/core help ?,

http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vinc...@axis.com

Regards
afzal


RE: [PATCH] sched: don't output cpu sched info by default

2016-04-26 Thread Zhang, Yanmin
I double-check it.
Function show_state_filter chooses to print all tasks when
state_filter == 0 (TASK_RUNNING).
sysrq_sched_debug_show prints per cpu sched debug info.

They are different info.

It is better to use sched_debug_enabled to control the
2nd part output.

Yanmin

-Original Message-
From: Zhang, Yanmin 
Sent: Tuesday, April 26, 2016 9:04 AM
To: Afzal Mohammed; Zhang, LongX
Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] sched: don't output cpu sched info by default

Thanks for the pointer. We didn't notice that. That patch also fixes the issue 
indeed.

Yanmin

-Original Message-
From: Afzal Mohammed [mailto:afzal.mohd...@gmail.com]
Sent: Monday, April 25, 2016 6:50 PM
To: Zhang, LongX
Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org; 
Zhang, Yanmin
Subject: Re: [PATCH] sched: don't output cpu sched info by default

Hi,

On Mon, Apr 25, 2016 at 01:04:41PM +0800, Zhang Long wrote:

> Android userspace debug prcoess might dump system info by sysrq.
> One info is of cpu sched. Usually, one thread has one line dump.
> Such log is huge sometimes and such dumping spends lots of time and 
> make the system worse. Sometimes, watchdog resets the system in the 
> end.
> 
> The patch fixes it by dumping cpu sched info only when 
> sched_debug_enabled is true.

Doesn't "sched: don't dump sched debug info in SysRq-W" currently in 
tip/sched/core help ?,

http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vinc...@axis.com

Regards
afzal


RE: [PATCH] sched: don't output cpu sched info by default

2016-04-25 Thread Zhang, Yanmin
Thanks for the pointer. We didn't notice that. That patch also
fixes the issue indeed.

Yanmin

-Original Message-
From: Afzal Mohammed [mailto:afzal.mohd...@gmail.com] 
Sent: Monday, April 25, 2016 6:50 PM
To: Zhang, LongX
Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org; 
Zhang, Yanmin
Subject: Re: [PATCH] sched: don't output cpu sched info by default

Hi,

On Mon, Apr 25, 2016 at 01:04:41PM +0800, Zhang Long wrote:

> Android userspace debug prcoess might dump system info by sysrq.
> One info is of cpu sched. Usually, one thread has one line dump.
> Such log is huge sometimes and such dumping spends lots of time and 
> make the system worse. Sometimes, watchdog resets the system in the 
> end.
> 
> The patch fixes it by dumping cpu sched info only when 
> sched_debug_enabled is true.

Doesn't "sched: don't dump sched debug info in SysRq-W" currently in 
tip/sched/core help ?,

http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vinc...@axis.com

Regards
afzal


RE: [PATCH] sched: don't output cpu sched info by default

2016-04-25 Thread Zhang, Yanmin
Thanks for the pointer. We didn't notice that. That patch also
fixes the issue indeed.

Yanmin

-Original Message-
From: Afzal Mohammed [mailto:afzal.mohd...@gmail.com] 
Sent: Monday, April 25, 2016 6:50 PM
To: Zhang, LongX
Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org; 
Zhang, Yanmin
Subject: Re: [PATCH] sched: don't output cpu sched info by default

Hi,

On Mon, Apr 25, 2016 at 01:04:41PM +0800, Zhang Long wrote:

> Android userspace debug prcoess might dump system info by sysrq.
> One info is of cpu sched. Usually, one thread has one line dump.
> Such log is huge sometimes and such dumping spends lots of time and 
> make the system worse. Sometimes, watchdog resets the system in the 
> end.
> 
> The patch fixes it by dumping cpu sched info only when 
> sched_debug_enabled is true.

Doesn't "sched: don't dump sched debug info in SysRq-W" currently in 
tip/sched/core help ?,

http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vinc...@axis.com

Regards
afzal


Re: [PATCH] trace: correct start_index in find_next

2016-01-06 Thread Zhang, Yanmin
On 2016/1/7 11:50, Steven Rostedt wrote:
> On Thu, 07 Jan 2016 10:56:56 +0800
> "Zhang, Yanmin"  wrote:
>
>> How is this patch? It fixes a kernel panic.
>>
> Linus already pulled it. It's also marked for stable. See commit:
> f36d1be2930ede0a1947686e1126ffda5d5ee1bb in Linus's tree.
>
> I renamed the subject slightly to:
>
> "tracing: Fix setting of start_index in find_next()"
>
Thanks as usual.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: correct start_index in find_next

2016-01-06 Thread Zhang, Yanmin
On 2015/12/31 13:11, Qiu, PeiyangX wrote:
> From: Qiu Peiyang 
>
> When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
> panic at t_show.
>
> general protection fault:  [#1] PREEMPT SMP 
> CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
> RIP: 0010:[] 
>  [] t_show+0x22/0xe0
> RSP: :88002b4ebe80  EFLAGS: 00010246
> RAX:  RBX:  RCX: 0004
> RDX: 0004 RSI: 81fd26a6 RDI: 880032f9f7b1
> RBP: 88002b4ebe98 R08: 1000 R09: ffec
> R10:  R11: 000f R12: 880004d9b6c0
> R13: 7365725f6d706400 R14: 880004d9b6c0 R15: 82020570
> FS:  () GS:88003aa0(0063) knlGS:f776bc40
> CS:  0010 DS: 002b ES: 002b CR0: 80050033
> CR2: f6c02ff0 CR3: 2c2b3000 CR4: 001007f0
> Call Trace:
>  [] seq_read+0x2f6/0x3e0
>  [] vfs_read+0x9b/0x160
>  [] SyS_read+0x49/0xb0
>  [] ia32_do_call+0x13/0x13
>  ---[ end trace 5bd9eb630614861e ]---
> Kernel panic - not syncing: Fatal exception
>
> When the first time find_next calls find_next_mod_format, it should 
> iterate the trace_bprintk_fmt_list to find the first print format of
> the module. However in current code, start_index is smaller than *pos
> at first, and code will not iterate the list. Latter container_of will
> get the wrong address with former v, which will cause mod_fmt be a
> meaningless object and so is the returned mod_fmt->fmt.
>
> This patch will fix it by correcting the start_index. After fixed,
> when the first time calls find_next_mod_format, start_index will be 
> equal to *pos, and code will iterate the trace_bprintk_fmt_list to 
> get the right module printk format, so is the returned mod_fmt->fmt.
>
> Signed-off-by: Qiu Peiyang 
> ---
>  kernel/trace/trace_printk.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
> index 1c2b285..64f0a26 100644
> --- a/kernel/trace/trace_printk.c
> +++ b/kernel/trace/trace_printk.c
> @@ -273,6 +273,7 @@ static const char **find_next(void *v, loff_t *pos)
>   if (*pos < last_index + start_index)
>   return __start___tracepoint_str + (*pos - last_index);
>  
> + start_index += last_index;
>   return find_next_mod_format(start_index, v, fmt, pos);
>  }

Rusty, Steven,

How is this patch? It fixes a kernel panic.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: correct start_index in find_next

2016-01-06 Thread Zhang, Yanmin
On 2015/12/31 13:11, Qiu, PeiyangX wrote:
> From: Qiu Peiyang 
>
> When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
> panic at t_show.
>
> general protection fault:  [#1] PREEMPT SMP 
> CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
> RIP: 0010:[] 
>  [] t_show+0x22/0xe0
> RSP: :88002b4ebe80  EFLAGS: 00010246
> RAX:  RBX:  RCX: 0004
> RDX: 0004 RSI: 81fd26a6 RDI: 880032f9f7b1
> RBP: 88002b4ebe98 R08: 1000 R09: ffec
> R10:  R11: 000f R12: 880004d9b6c0
> R13: 7365725f6d706400 R14: 880004d9b6c0 R15: 82020570
> FS:  () GS:88003aa0(0063) knlGS:f776bc40
> CS:  0010 DS: 002b ES: 002b CR0: 80050033
> CR2: f6c02ff0 CR3: 2c2b3000 CR4: 001007f0
> Call Trace:
>  [] seq_read+0x2f6/0x3e0
>  [] vfs_read+0x9b/0x160
>  [] SyS_read+0x49/0xb0
>  [] ia32_do_call+0x13/0x13
>  ---[ end trace 5bd9eb630614861e ]---
> Kernel panic - not syncing: Fatal exception
>
> When the first time find_next calls find_next_mod_format, it should 
> iterate the trace_bprintk_fmt_list to find the first print format of
> the module. However in current code, start_index is smaller than *pos
> at first, and code will not iterate the list. Latter container_of will
> get the wrong address with former v, which will cause mod_fmt be a
> meaningless object and so is the returned mod_fmt->fmt.
>
> This patch will fix it by correcting the start_index. After fixed,
> when the first time calls find_next_mod_format, start_index will be 
> equal to *pos, and code will iterate the trace_bprintk_fmt_list to 
> get the right module printk format, so is the returned mod_fmt->fmt.
>
> Signed-off-by: Qiu Peiyang 
> ---
>  kernel/trace/trace_printk.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
> index 1c2b285..64f0a26 100644
> --- a/kernel/trace/trace_printk.c
> +++ b/kernel/trace/trace_printk.c
> @@ -273,6 +273,7 @@ static const char **find_next(void *v, loff_t *pos)
>   if (*pos < last_index + start_index)
>   return __start___tracepoint_str + (*pos - last_index);
>  
> + start_index += last_index;
>   return find_next_mod_format(start_index, v, fmt, pos);
>  }

Rusty, Steven,

How is this patch? It fixes a kernel panic.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: correct start_index in find_next

2016-01-06 Thread Zhang, Yanmin
On 2016/1/7 11:50, Steven Rostedt wrote:
> On Thu, 07 Jan 2016 10:56:56 +0800
> "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote:
>
>> How is this patch? It fixes a kernel panic.
>>
> Linus already pulled it. It's also marked for stable. See commit:
> f36d1be2930ede0a1947686e1126ffda5d5ee1bb in Linus's tree.
>
> I renamed the subject slightly to:
>
> "tracing: Fix setting of start_index in find_next()"
>
Thanks as usual.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] module: deal with the failure of complete_formation

2016-01-05 Thread Zhang, Yanmin


On 2016/1/6 9:29, Steven Rostedt wrote:
> On Wed, 06 Jan 2016 09:14:42 +0800
> "Zhang, Yanmin"  wrote:
>
>
>> It's a good idea although ftrace_module_init_fail might be complicated.
> How so? I posted the patch. All it does is to call
> ftrace_release_mod(), which will only remove pages that are included
> with the module. If the pages were not added yet, nothing is done.

You are right. ftrace_release_mod can be used.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] module: deal with the failure of complete_formation

2016-01-05 Thread Zhang, Yanmin
On 2016/1/6 9:01, Steven Rostedt wrote:
> On Fri, 25 Dec 2015 15:03:13 +0800
> "Qiu, PeiyangX"  wrote:
>
>> From: Qiu Peiyang 
>>
>> complete_formation might fail. kernel need clean up
>> ftrace records of the module.
>>
>> The patch fixes it by tuning the operation sequence in
>> complete_formation. After complete_formation checks
>> verify_export_symbols, call ftrace_module_init to init
>> ftrace records.
>>
>> Signed-off-by: Qiu Peiyang 
>> Signed-off-by: Zhang Yanmin 
>> ---
>>  kernel/module.c | 9 +
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 8f051a1..0a67c4e 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -3373,6 +3373,11 @@ static int complete_formation(struct module *mod, 
>> struct load_info *info)
>>  /* This relies on module_mutex for list integrity. */
>>  module_bug_finalize(info->hdr, info->sechdrs, mod);
>>  
>> +mutex_unlock(_mutex);
>> +
> First of all, this is buggy. You can't just move the locking of
> module_mutex. That is needed to modify mod->state.
>
> Second of all, you are solving this wrong. I guess we should add
> ftrace_module_init_fail() function to cover the cases where the module
> can fail before calling do_init_module(), as if that happens it does
> not call the module going notifiers.

It's a good idea although ftrace_module_init_fail might be complicated.

Thanks,
Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] module: deal with the failure of complete_formation

2016-01-05 Thread Zhang, Yanmin
On 2016/1/6 9:01, Steven Rostedt wrote:
> On Fri, 25 Dec 2015 15:03:13 +0800
> "Qiu, PeiyangX" <peiyangx@intel.com> wrote:
>
>> From: Qiu Peiyang <peiyangx@intel.com>
>>
>> complete_formation might fail. kernel need clean up
>> ftrace records of the module.
>>
>> The patch fixes it by tuning the operation sequence in
>> complete_formation. After complete_formation checks
>> verify_export_symbols, call ftrace_module_init to init
>> ftrace records.
>>
>> Signed-off-by: Qiu Peiyang <peiyangx@intel.com>
>> Signed-off-by: Zhang Yanmin <yanmin.zh...@intel.com>
>> ---
>>  kernel/module.c | 9 +
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 8f051a1..0a67c4e 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -3373,6 +3373,11 @@ static int complete_formation(struct module *mod, 
>> struct load_info *info)
>>  /* This relies on module_mutex for list integrity. */
>>  module_bug_finalize(info->hdr, info->sechdrs, mod);
>>  
>> +mutex_unlock(_mutex);
>> +
> First of all, this is buggy. You can't just move the locking of
> module_mutex. That is needed to modify mod->state.
>
> Second of all, you are solving this wrong. I guess we should add
> ftrace_module_init_fail() function to cover the cases where the module
> can fail before calling do_init_module(), as if that happens it does
> not call the module going notifiers.

It's a good idea although ftrace_module_init_fail might be complicated.

Thanks,
Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] module: deal with the failure of complete_formation

2016-01-05 Thread Zhang, Yanmin


On 2016/1/6 9:29, Steven Rostedt wrote:
> On Wed, 06 Jan 2016 09:14:42 +0800
> "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote:
>
>
>> It's a good idea although ftrace_module_init_fail might be complicated.
> How so? I posted the patch. All it does is to call
> ftrace_release_mod(), which will only remove pages that are included
> with the module. If the pages were not added yet, nothing is done.

You are right. ftrace_release_mod can be used.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-16 Thread Zhang, Yanmin
On 2015/12/16 22:28, Steven Rostedt wrote:
> On Wed, 16 Dec 2015 18:28:35 +0800
> "Zhang, Yanmin"  wrote:
>
>>> +   /*
>>> +* If the tracing is enabled, go ahead and enable the record.
>>> +*
>>> +* The reason not to enable the record immediatelly is the
>>> +* inherent check of ftrace_make_nop/ftrace_make_call for
>>> +* correct previous instructions.  Making first the NOP
>>> +* conversion puts the module to the correct state, thus
>>> +* passing the ftrace_make_call check.
>>> +*
>>> +* We also delay this to after the module code already set the
>>> +* text to read-only, as we now need to set it back to read-write
>>> +* so that we can modify the text.
>>> +*/
>>> +   if (ftrace_start_up)
>>> +   ftrace_arch_code_modify_prepare();
>>> +
>>> +   do_for_each_ftrace_rec(pg, rec) {
>>> +   int cnt;
>>> +   /*
>>> +* do_for_each_ftrace_rec() is a double loop.
>>> +* module text shares the pg. If a record is
>>> +* not part of this module, then skip this pg,
>>> +* which the "break" will do.
>>> +*/
>>> +   if (!within_module_core(rec->ip, mod))
>>> +   break;
>>> +
>>> +   cnt = 0;
>>> +
>>> +   /*
>>> +* When adding a module, we need to check if tracers are
>>> +* currently enabled and if they are, and can trace this record,
>>> +* we need to enable the module functions as well as update the
>>> +* reference counts for those function records.
>>> +*/
>>> +   if (ftrace_start_up)
>>> +   cnt += referenced_filters(rec);
>>> +
>>> +   /* This clears FTRACE_FL_DISABLED */
>>> +   rec->flags = cnt;
>>> +
>>> +   if (ftrace_start_up && cnt) {
>>> +   int failed = __ftrace_replace_code(rec, 1);  
>> If we choose to call ftrace_module_enable when receiving module notification
>>   MODULE_STATE_COMING, TEXT section of the module is already changed to RO.
> And that's why we call ftrace_arch_code_modify_prepare(). That should
> change all text to RW.

Thanks for the kind pointer. 
We would add codes into your patch based on notifier and send patch to you by
private email.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-16 Thread Zhang, Yanmin

On 2015/12/16 1:37, Steven Rostedt wrote:
> On Tue, 15 Dec 2015 11:26:41 +0800
> "Zhang, Yanmin"  wrote:
>
>>> This seems very hackish, although I can't think of a better way at the
>>> moment. But I would like not to add more code into module.c if
>>> possible, and just use a notifier unless there's a real reason we can't
>>> (as there was when we added the hook in the first place).
>>> We would double-check it again. It's not a good idea to change common
>>> module codes.  
>> I double-checked it. If using notifier, we have to add 2 new module 
>> notification
>> events in enum module_state.
>> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED.
>> At MODULE_STATE_INITING, call ftrace_module_init(mod);
>> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar 
>> function);
>> At MODULE_STATE_COMING, call a new function which links the new start_pg
>> to ftrace_pages.
>> Such design can also fix another bug in current kernel that ftrace start_pg
>> of the module is not cleaned up if complete_formation fails.
>> However, we change module common codes. The new events only serve ftrace.
>>
>> If not holding ftrace_mutex across start_pg setup and complete_formation,
>> some other variables are not protected well, such like ftrace_pages, 
>> ftrace_start_up, and so on.
> OK, that was basically the reason the hook was added in the first place.
>
> The reason the modules notifier doesn't work here is that there's too
> many exits from the module code that may leave the mutex held.
>
> I thought about this some more. So the issue is that we add the module
> functions to the ftrace records. Then another task enables function
> tracing before the module is fully set up. As ftrace is enabling
> function tracing, the module sets its text to read-only, outside the
> scope of ftrace setting all text to read-write. When ftrace gets to the
> module code, it fails to do the modifications and you get the
> ftrace_bug() splat.
>
> Sounds right?
>
> Now, I really dislike the taking of the ftrace mutex and returning back
> to module handling. That is very subtle, fragile and prone to bugs. I
> took a step back to find another solution and I think I found one.
>
> Take the below patch and add to it (I'll keep this patch as mine, and
> you can submit another patch to do the following). You don't need to
> resend this patch, just send me a patch that does this:
>
>
> Add the module notifier that calls ftrace_module_enable(mod), removing
> it from ftrace_init_module(). Feel free to monkey with that function to
> be able to be called directly if it can't already.
>
> What my patch does is to create a new ftrace record flag DISABLED,
> which is set to all functions in a module as it is added. Then later on
> (in the module notifier that you will add), the flag is cleared. In
> between, the module functions will be ignored.
>
>
> If the module errors out and the notifier is not called, we don't care.
> The records will simply be removed, and the flag will be meaningless.
>
> I needed to move the code around to still let the module code get
> enabled if tracing was already enabled. I even cleaned that code up as
> I found it has gotten a bit messy.
>
> -- Steve
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 4736a826baf5..660e7c698f3b 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -357,6 +357,7 @@ bool is_ftrace_trampoline(unsigned long addr);
>   *  REGS- the record wants the function to save regs
>   *  REGS_EN - the function is set up to save regs.
>   *  IPMODIFY - the record allows for the IP address to be changed.
> + *  DISABLED - the record is not ready to be touched yet
>   *
>   * When a new ftrace_ops is registered and wants a function to save
>   * pt_regs, the rec->flag REGS is set. When the function has been
> @@ -371,10 +372,11 @@ enum {
>   FTRACE_FL_TRAMP = (1UL << 28),
>   FTRACE_FL_TRAMP_EN  = (1UL << 27),
>   FTRACE_FL_IPMODIFY  = (1UL << 26),
> + FTRACE_FL_DISABLED  = (1UL << 25),
>  };
>  
> -#define FTRACE_REF_MAX_SHIFT 26
> -#define FTRACE_FL_BITS   6
> +#define FTRACE_REF_MAX_SHIFT 25
> +#define FTRACE_FL_BITS   7
>  #define FTRACE_FL_MASKED_BITS((1UL << FTRACE_FL_BITS) - 1)
>  #define FTRACE_FL_MASK   (FTRACE_FL_MASKED_BITS << 
> FTRACE_REF_MAX_SHIFT)
>  #define FTRACE_REF_MAX   ((1UL << FTRACE_REF_MAX_SHIFT) - 1)
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index e290a30f2d0b..e7e15874fb7c 1

Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-16 Thread Zhang, Yanmin
On 2015/12/16 22:28, Steven Rostedt wrote:
> On Wed, 16 Dec 2015 18:28:35 +0800
> "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote:
>
>>> +   /*
>>> +* If the tracing is enabled, go ahead and enable the record.
>>> +*
>>> +* The reason not to enable the record immediatelly is the
>>> +* inherent check of ftrace_make_nop/ftrace_make_call for
>>> +* correct previous instructions.  Making first the NOP
>>> +* conversion puts the module to the correct state, thus
>>> +* passing the ftrace_make_call check.
>>> +*
>>> +* We also delay this to after the module code already set the
>>> +* text to read-only, as we now need to set it back to read-write
>>> +* so that we can modify the text.
>>> +*/
>>> +   if (ftrace_start_up)
>>> +   ftrace_arch_code_modify_prepare();
>>> +
>>> +   do_for_each_ftrace_rec(pg, rec) {
>>> +   int cnt;
>>> +   /*
>>> +* do_for_each_ftrace_rec() is a double loop.
>>> +* module text shares the pg. If a record is
>>> +* not part of this module, then skip this pg,
>>> +* which the "break" will do.
>>> +*/
>>> +   if (!within_module_core(rec->ip, mod))
>>> +   break;
>>> +
>>> +   cnt = 0;
>>> +
>>> +   /*
>>> +* When adding a module, we need to check if tracers are
>>> +* currently enabled and if they are, and can trace this record,
>>> +* we need to enable the module functions as well as update the
>>> +* reference counts for those function records.
>>> +*/
>>> +   if (ftrace_start_up)
>>> +   cnt += referenced_filters(rec);
>>> +
>>> +   /* This clears FTRACE_FL_DISABLED */
>>> +   rec->flags = cnt;
>>> +
>>> +   if (ftrace_start_up && cnt) {
>>> +   int failed = __ftrace_replace_code(rec, 1);  
>> If we choose to call ftrace_module_enable when receiving module notification
>>   MODULE_STATE_COMING, TEXT section of the module is already changed to RO.
> And that's why we call ftrace_arch_code_modify_prepare(). That should
> change all text to RW.

Thanks for the kind pointer. 
We would add codes into your patch based on notifier and send patch to you by
private email.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-16 Thread Zhang, Yanmin

On 2015/12/16 1:37, Steven Rostedt wrote:
> On Tue, 15 Dec 2015 11:26:41 +0800
> "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote:
>
>>> This seems very hackish, although I can't think of a better way at the
>>> moment. But I would like not to add more code into module.c if
>>> possible, and just use a notifier unless there's a real reason we can't
>>> (as there was when we added the hook in the first place).
>>> We would double-check it again. It's not a good idea to change common
>>> module codes.  
>> I double-checked it. If using notifier, we have to add 2 new module 
>> notification
>> events in enum module_state.
>> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED.
>> At MODULE_STATE_INITING, call ftrace_module_init(mod);
>> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar 
>> function);
>> At MODULE_STATE_COMING, call a new function which links the new start_pg
>> to ftrace_pages.
>> Such design can also fix another bug in current kernel that ftrace start_pg
>> of the module is not cleaned up if complete_formation fails.
>> However, we change module common codes. The new events only serve ftrace.
>>
>> If not holding ftrace_mutex across start_pg setup and complete_formation,
>> some other variables are not protected well, such like ftrace_pages, 
>> ftrace_start_up, and so on.
> OK, that was basically the reason the hook was added in the first place.
>
> The reason the modules notifier doesn't work here is that there's too
> many exits from the module code that may leave the mutex held.
>
> I thought about this some more. So the issue is that we add the module
> functions to the ftrace records. Then another task enables function
> tracing before the module is fully set up. As ftrace is enabling
> function tracing, the module sets its text to read-only, outside the
> scope of ftrace setting all text to read-write. When ftrace gets to the
> module code, it fails to do the modifications and you get the
> ftrace_bug() splat.
>
> Sounds right?
>
> Now, I really dislike the taking of the ftrace mutex and returning back
> to module handling. That is very subtle, fragile and prone to bugs. I
> took a step back to find another solution and I think I found one.
>
> Take the below patch and add to it (I'll keep this patch as mine, and
> you can submit another patch to do the following). You don't need to
> resend this patch, just send me a patch that does this:
>
>
> Add the module notifier that calls ftrace_module_enable(mod), removing
> it from ftrace_init_module(). Feel free to monkey with that function to
> be able to be called directly if it can't already.
>
> What my patch does is to create a new ftrace record flag DISABLED,
> which is set to all functions in a module as it is added. Then later on
> (in the module notifier that you will add), the flag is cleared. In
> between, the module functions will be ignored.
>
>
> If the module errors out and the notifier is not called, we don't care.
> The records will simply be removed, and the flag will be meaningless.
>
> I needed to move the code around to still let the module code get
> enabled if tracing was already enabled. I even cleaned that code up as
> I found it has gotten a bit messy.
>
> -- Steve
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 4736a826baf5..660e7c698f3b 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -357,6 +357,7 @@ bool is_ftrace_trampoline(unsigned long addr);
>   *  REGS- the record wants the function to save regs
>   *  REGS_EN - the function is set up to save regs.
>   *  IPMODIFY - the record allows for the IP address to be changed.
> + *  DISABLED - the record is not ready to be touched yet
>   *
>   * When a new ftrace_ops is registered and wants a function to save
>   * pt_regs, the rec->flag REGS is set. When the function has been
> @@ -371,10 +372,11 @@ enum {
>   FTRACE_FL_TRAMP = (1UL << 28),
>   FTRACE_FL_TRAMP_EN  = (1UL << 27),
>   FTRACE_FL_IPMODIFY  = (1UL << 26),
> + FTRACE_FL_DISABLED  = (1UL << 25),
>  };
>  
> -#define FTRACE_REF_MAX_SHIFT 26
> -#define FTRACE_FL_BITS   6
> +#define FTRACE_REF_MAX_SHIFT 25
> +#define FTRACE_FL_BITS   7
>  #define FTRACE_FL_MASKED_BITS((1UL << FTRACE_FL_BITS) - 1)
>  #define FTRACE_FL_MASK   (FTRACE_FL_MASKED_BITS << 
> FTRACE_REF_MAX_SHIFT)
>  #define FTRACE_REF_MAX   ((1UL << FTRACE_REF_MAX_SHIFT) - 1)
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>

Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-15 Thread Zhang, Yanmin
On 2015/12/16 1:37, Steven Rostedt wrote:
> On Tue, 15 Dec 2015 11:26:41 +0800
> "Zhang, Yanmin"  wrote:
>
>>> This seems very hackish, although I can't think of a better way at the
>>> moment. But I would like not to add more code into module.c if
>>> possible, and just use a notifier unless there's a real reason we can't
>>> (as there was when we added the hook in the first place).
>>> We would double-check it again. It's not a good idea to change common
>>> module codes.  
>> I double-checked it. If using notifier, we have to add 2 new module 
>> notification
>> events in enum module_state.
>> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED.
>> At MODULE_STATE_INITING, call ftrace_module_init(mod);
>> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar 
>> function);
>> At MODULE_STATE_COMING, call a new function which links the new start_pg
>> to ftrace_pages.
>> Such design can also fix another bug in current kernel that ftrace start_pg
>> of the module is not cleaned up if complete_formation fails.
>> However, we change module common codes. The new events only serve ftrace.
>>
>> If not holding ftrace_mutex across start_pg setup and complete_formation,
>> some other variables are not protected well, such like ftrace_pages, 
>> ftrace_start_up, and so on.
> OK, that was basically the reason the hook was added in the first place.
>
> The reason the modules notifier doesn't work here is that there's too
> many exits from the module code that may leave the mutex held.

Yes.

>
> I thought about this some more. So the issue is that we add the module
> functions to the ftrace records. Then another task enables function
> tracing before the module is fully set up. As ftrace is enabling
> function tracing, the module sets its text to read-only, outside the
> scope of ftrace setting all text to read-write. When ftrace gets to the
> module code, it fails to do the modifications and you get the
> ftrace_bug() splat.
>
> Sounds right?

Right, indeed.
My case is worse than that as my android uses kernel 3.14.50, which has another
bug around remove_breakpoint as it calls probe_kernel_write directly. The
latest upstream has no such issue. So we just need focus on the race you
explain well.

>
> Now, I really dislike the taking of the ftrace mutex and returning back
> to module handling. That is very subtle, fragile and prone to bugs.

Yes, just like that the current codes are not perfect. :)

>  I
> took a step back to find another solution and I think I found one.
>
> Take the below patch and add to it (I'll keep this patch as mine, and
> you can submit another patch to do the following). You don't need to
> resend this patch, just send me a patch that does this:

Ok.

>
>
> Add the module notifier that calls ftrace_module_enable(mod), removing
> it from ftrace_init_module(). Feel free to monkey with that function to
> be able to be called directly if it can't already.
>
> What my patch does is to create a new ftrace record flag DISABLED,
> which is set to all functions in a module as it is added. Then later on
> (in the module notifier that you will add), the flag is cleared. In
> between, the module functions will be ignored.

It's a good idea.


>
> If the module errors out and the notifier is not called, we don't care.
> The records will simply be removed, and the flag will be meaningless.

One question: if complete_formation fails, load_module
wouldn't send any notification and ftrace_release_mod is not called.
How can ftrace core remove all pg->records of the module?

If complete_formation succeeds, further calls cause module errors out,
load_module would send MODULE_STATE_GOING, ftrace_release_mod is called.

Thanks for your quick response.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-15 Thread Zhang, Yanmin
On 2015/12/16 1:37, Steven Rostedt wrote:
> On Tue, 15 Dec 2015 11:26:41 +0800
> "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote:
>
>>> This seems very hackish, although I can't think of a better way at the
>>> moment. But I would like not to add more code into module.c if
>>> possible, and just use a notifier unless there's a real reason we can't
>>> (as there was when we added the hook in the first place).
>>> We would double-check it again. It's not a good idea to change common
>>> module codes.  
>> I double-checked it. If using notifier, we have to add 2 new module 
>> notification
>> events in enum module_state.
>> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED.
>> At MODULE_STATE_INITING, call ftrace_module_init(mod);
>> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar 
>> function);
>> At MODULE_STATE_COMING, call a new function which links the new start_pg
>> to ftrace_pages.
>> Such design can also fix another bug in current kernel that ftrace start_pg
>> of the module is not cleaned up if complete_formation fails.
>> However, we change module common codes. The new events only serve ftrace.
>>
>> If not holding ftrace_mutex across start_pg setup and complete_formation,
>> some other variables are not protected well, such like ftrace_pages, 
>> ftrace_start_up, and so on.
> OK, that was basically the reason the hook was added in the first place.
>
> The reason the modules notifier doesn't work here is that there's too
> many exits from the module code that may leave the mutex held.

Yes.

>
> I thought about this some more. So the issue is that we add the module
> functions to the ftrace records. Then another task enables function
> tracing before the module is fully set up. As ftrace is enabling
> function tracing, the module sets its text to read-only, outside the
> scope of ftrace setting all text to read-write. When ftrace gets to the
> module code, it fails to do the modifications and you get the
> ftrace_bug() splat.
>
> Sounds right?

Right, indeed.
My case is worse than that as my android uses kernel 3.14.50, which has another
bug around remove_breakpoint as it calls probe_kernel_write directly. The
latest upstream has no such issue. So we just need focus on the race you
explain well.

>
> Now, I really dislike the taking of the ftrace mutex and returning back
> to module handling. That is very subtle, fragile and prone to bugs.

Yes, just like that the current codes are not perfect. :)

>  I
> took a step back to find another solution and I think I found one.
>
> Take the below patch and add to it (I'll keep this patch as mine, and
> you can submit another patch to do the following). You don't need to
> resend this patch, just send me a patch that does this:

Ok.

>
>
> Add the module notifier that calls ftrace_module_enable(mod), removing
> it from ftrace_init_module(). Feel free to monkey with that function to
> be able to be called directly if it can't already.
>
> What my patch does is to create a new ftrace record flag DISABLED,
> which is set to all functions in a module as it is added. Then later on
> (in the module notifier that you will add), the flag is cleared. In
> between, the module functions will be ignored.

It's a good idea.


>
> If the module errors out and the notifier is not called, we don't care.
> The records will simply be removed, and the flag will be meaningless.

One question: if complete_formation fails, load_module
wouldn't send any notification and ftrace_release_mod is not called.
How can ftrace core remove all pg->records of the module?

If complete_formation succeeds, further calls cause module errors out,
load_module would send MODULE_STATE_GOING, ftrace_release_mod is called.

Thanks for your quick response.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-14 Thread Zhang, Yanmin
On 2015/12/15 9:05, Zhang, Yanmin wrote:
> On 2015/12/14 23:51, Steven Rostedt wrote:
>> On Mon, 14 Dec 2015 11:16:18 +0800
>> "Qiu, PeiyangX"  wrote:
>>
>>> We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip.
>>> Basically, there is a race between insmod and ftrace_run_update_code.
>>>
>>> After load_module=>ftrace_module_init, another thread jumps in to call
>>> ftrace_run_update_code=>ftrace_arch_code_modify_prepare
>>>  =>set_all_modules_text_rw, to change all modules  
>>> as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute
>>> is not changed. Then, the 2nd thread goes ahead to change codes.
>>> However, load_module continues to call 
>>> complete_formation=>set_section_ro_nx,  
>>> then 2nd thread would fail when probing the module's TEXT.
>>>
>>> Below patch tries to resolve it.
>>>
>>> commit a949ae560a511fe4e3adf48fa44fefded93e5c2b
>>> Author: Steven Rostedt (Red Hat) 
>>> Date:   Thu Apr 24 10:40:12 2014 -0400
>>>
>>>  ftrace/module: Hardcode ftrace_module_init() call into load_module()
>>>
>>> But it can't fully resolve the issue.
>>>
>>> THis patch holds ftrace_mutex across ftrace_module_init and
>>> complete_formation.
>>>
>>> Signed-off-by: Qiu Peiyang 
>>> Signed-off-by: Zhang Yanmin 
>> First, this patch has major whitespace damage. All tabs are now spaces!
> This seems very hackish, although I can't think of a better way at the
> moment. But I would like not to add more code into module.c if
> possible, and just use a notifier unless there's a real reason we can't
> (as there was when we added the hook in the first place).
> We would double-check it again. It's not a good idea to change common
> module codes.

I double-checked it. If using notifier, we have to add 2 new module notification
events in enum module_state.
For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED.
At MODULE_STATE_INITING, call ftrace_module_init(mod);
At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar function);
At MODULE_STATE_COMING, call a new function which links the new start_pg
to ftrace_pages.
Such design can also fix another bug in current kernel that ftrace start_pg
of the module is not cleaned up if complete_formation fails.
However, we change module common codes. The new events only serve ftrace.

If not holding ftrace_mutex across start_pg setup and complete_formation,
some other variables are not protected well, such like ftrace_pages, 
ftrace_start_up, and so on.

Yanmin
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-14 Thread Zhang, Yanmin
On 2015/12/14 23:51, Steven Rostedt wrote:
> On Mon, 14 Dec 2015 11:16:18 +0800
> "Qiu, PeiyangX"  wrote:
>
>> We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip.
>> Basically, there is a race between insmod and ftrace_run_update_code.
>>
>> After load_module=>ftrace_module_init, another thread jumps in to call
>> ftrace_run_update_code=>ftrace_arch_code_modify_prepare
>>  =>set_all_modules_text_rw, to change all modules  
>> as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute
>> is not changed. Then, the 2nd thread goes ahead to change codes.
>> However, load_module continues to call 
>> complete_formation=>set_section_ro_nx,  
>> then 2nd thread would fail when probing the module's TEXT.
>>
>> Below patch tries to resolve it.
>>
>> commit a949ae560a511fe4e3adf48fa44fefded93e5c2b
>> Author: Steven Rostedt (Red Hat) 
>> Date:   Thu Apr 24 10:40:12 2014 -0400
>>
>>  ftrace/module: Hardcode ftrace_module_init() call into load_module()
>>
>> But it can't fully resolve the issue.
>>
>> THis patch holds ftrace_mutex across ftrace_module_init and
>> complete_formation.
>>
>> Signed-off-by: Qiu Peiyang 
>> Signed-off-by: Zhang Yanmin 
> First, this patch has major whitespace damage. All tabs are now spaces!

Sorry. That's really bad. It's the 1st time for Peiyang to send patch to LKML.
He would fix such issue by reconfiguring email client.

>
>> ---
>>   include/linux/ftrace.h |  6 --
>>   kernel/module.c|  3 ++-
>>   kernel/trace/ftrace.c  | 33 ++---
>>   3 files changed, 28 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
>> index eae6548..4adc279 100644
>> --- a/include/linux/ftrace.h
>> +++ b/include/linux/ftrace.h
>> @@ -585,7 +585,8 @@ static inline int ftrace_modify_call(struct 
>> dyn_ftrace *rec, unsigned long old_a
>>   extern int ftrace_arch_read_dyn_info(char *buf, int size);
>>
>>   extern int skip_trace(unsigned long ip);
>> -extern void ftrace_module_init(struct module *mod);
>> +extern void ftrace_module_init_start(struct module *mod);
>> +extern void ftrace_module_init_end(struct module *mod);
>>
>>   extern void ftrace_disable_daemon(void);
>>   extern void ftrace_enable_daemon(void);
>> @@ -595,7 +596,8 @@ static inline int ftrace_force_update(void) { return 
>> 0; }
>>   static inline void ftrace_disable_daemon(void) { }
>>   static inline void ftrace_enable_daemon(void) { }
>>   static inline void ftrace_release_mod(struct module *mod) {}
>> -static inline void ftrace_module_init(struct module *mod) {}
>> +static inline void ftrace_module_init_start(struct module *mod) {}
>> +static inline void ftrace_module_init_end(struct module *mod) {}
>>   static inline __init int register_ftrace_command(struct 
>> ftrace_func_command *cmd)
>>   {
>>   return -EINVAL;
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 8f051a1..324c5c6 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -3506,10 +3506,11 @@ static int load_module(struct load_info *info, 
>> const char __user *uargs,
>>   dynamic_debug_setup(info->debug, info->num_debug);
>>
>>   /* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
>> -ftrace_module_init(mod);
>> +ftrace_module_init_start(mod);
>>
>>   /* Finally it's fully formed, ready to start executing. */
>>   err = complete_formation(mod, info);
>> +ftrace_module_init_end(mod);
> Instead of modifying the module code even more, why not just put this
> into the module notifier when done?
>
> Then this would only need to change the ftrace code. Although, holding
> a lock throughout seems rather hacky.

We did consider notifier. When ftrace_module_init is called, start_pg of the
module is linked to ftrace_pages. To fix the issue, we have to either add a
new var to struct module to save its start_pg, or change ftrace to save
module start_pg before MODULE_STATE_COMING notification is delivered.
In addition, function complete_formation need consider cleanup the
start_pg if init fails.

>
>
>
>>   if (err)
>>   goto ddebug_cleanup;
>>
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index 3f743b1..436c199 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -4795,7 +4795,7 @@ static int ftrace_cmp_ips(const void *a, const 
>> void *b)
>&

Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-14 Thread Zhang, Yanmin
On 2015/12/14 23:51, Steven Rostedt wrote:
> On Mon, 14 Dec 2015 11:16:18 +0800
> "Qiu, PeiyangX" <peiyangx@intel.com> wrote:
>
>> We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip.
>> Basically, there is a race between insmod and ftrace_run_update_code.
>>
>> After load_module=>ftrace_module_init, another thread jumps in to call
>> ftrace_run_update_code=>ftrace_arch_code_modify_prepare
>>  =>set_all_modules_text_rw, to change all modules  
>> as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute
>> is not changed. Then, the 2nd thread goes ahead to change codes.
>> However, load_module continues to call 
>> complete_formation=>set_section_ro_nx,  
>> then 2nd thread would fail when probing the module's TEXT.
>>
>> Below patch tries to resolve it.
>>
>> commit a949ae560a511fe4e3adf48fa44fefded93e5c2b
>> Author: Steven Rostedt (Red Hat) <rost...@goodmis.org>
>> Date:   Thu Apr 24 10:40:12 2014 -0400
>>
>>  ftrace/module: Hardcode ftrace_module_init() call into load_module()
>>
>> But it can't fully resolve the issue.
>>
>> THis patch holds ftrace_mutex across ftrace_module_init and
>> complete_formation.
>>
>> Signed-off-by: Qiu Peiyang <peiyangx@intel.com>
>> Signed-off-by: Zhang Yanmin <yanmin.zh...@intel.com>
> First, this patch has major whitespace damage. All tabs are now spaces!

Sorry. That's really bad. It's the 1st time for Peiyang to send patch to LKML.
He would fix such issue by reconfiguring email client.

>
>> ---
>>   include/linux/ftrace.h |  6 --
>>   kernel/module.c|  3 ++-
>>   kernel/trace/ftrace.c  | 33 ++---
>>   3 files changed, 28 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
>> index eae6548..4adc279 100644
>> --- a/include/linux/ftrace.h
>> +++ b/include/linux/ftrace.h
>> @@ -585,7 +585,8 @@ static inline int ftrace_modify_call(struct 
>> dyn_ftrace *rec, unsigned long old_a
>>   extern int ftrace_arch_read_dyn_info(char *buf, int size);
>>
>>   extern int skip_trace(unsigned long ip);
>> -extern void ftrace_module_init(struct module *mod);
>> +extern void ftrace_module_init_start(struct module *mod);
>> +extern void ftrace_module_init_end(struct module *mod);
>>
>>   extern void ftrace_disable_daemon(void);
>>   extern void ftrace_enable_daemon(void);
>> @@ -595,7 +596,8 @@ static inline int ftrace_force_update(void) { return 
>> 0; }
>>   static inline void ftrace_disable_daemon(void) { }
>>   static inline void ftrace_enable_daemon(void) { }
>>   static inline void ftrace_release_mod(struct module *mod) {}
>> -static inline void ftrace_module_init(struct module *mod) {}
>> +static inline void ftrace_module_init_start(struct module *mod) {}
>> +static inline void ftrace_module_init_end(struct module *mod) {}
>>   static inline __init int register_ftrace_command(struct 
>> ftrace_func_command *cmd)
>>   {
>>   return -EINVAL;
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 8f051a1..324c5c6 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -3506,10 +3506,11 @@ static int load_module(struct load_info *info, 
>> const char __user *uargs,
>>   dynamic_debug_setup(info->debug, info->num_debug);
>>
>>   /* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
>> -ftrace_module_init(mod);
>> +ftrace_module_init_start(mod);
>>
>>   /* Finally it's fully formed, ready to start executing. */
>>   err = complete_formation(mod, info);
>> +ftrace_module_init_end(mod);
> Instead of modifying the module code even more, why not just put this
> into the module notifier when done?
>
> Then this would only need to change the ftrace code. Although, holding
> a lock throughout seems rather hacky.

We did consider notifier. When ftrace_module_init is called, start_pg of the
module is linked to ftrace_pages. To fix the issue, we have to either add a
new var to struct module to save its start_pg, or change ftrace to save
module start_pg before MODULE_STATE_COMING notification is delivered.
In addition, function complete_formation need consider cleanup the
start_pg if init fails.

>
>
>
>>   if (err)
>>   goto ddebug_cleanup;
>>
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index 3f743b1..436c199 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
&

Re: [PATCH] ftrace: fix race between ftrace and insmod

2015-12-14 Thread Zhang, Yanmin
On 2015/12/15 9:05, Zhang, Yanmin wrote:
> On 2015/12/14 23:51, Steven Rostedt wrote:
>> On Mon, 14 Dec 2015 11:16:18 +0800
>> "Qiu, PeiyangX" <peiyangx@intel.com> wrote:
>>
>>> We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip.
>>> Basically, there is a race between insmod and ftrace_run_update_code.
>>>
>>> After load_module=>ftrace_module_init, another thread jumps in to call
>>> ftrace_run_update_code=>ftrace_arch_code_modify_prepare
>>>  =>set_all_modules_text_rw, to change all modules  
>>> as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute
>>> is not changed. Then, the 2nd thread goes ahead to change codes.
>>> However, load_module continues to call 
>>> complete_formation=>set_section_ro_nx,  
>>> then 2nd thread would fail when probing the module's TEXT.
>>>
>>> Below patch tries to resolve it.
>>>
>>> commit a949ae560a511fe4e3adf48fa44fefded93e5c2b
>>> Author: Steven Rostedt (Red Hat) <rost...@goodmis.org>
>>> Date:   Thu Apr 24 10:40:12 2014 -0400
>>>
>>>  ftrace/module: Hardcode ftrace_module_init() call into load_module()
>>>
>>> But it can't fully resolve the issue.
>>>
>>> THis patch holds ftrace_mutex across ftrace_module_init and
>>> complete_formation.
>>>
>>> Signed-off-by: Qiu Peiyang <peiyangx@intel.com>
>>> Signed-off-by: Zhang Yanmin <yanmin.zh...@intel.com>
>> First, this patch has major whitespace damage. All tabs are now spaces!
> This seems very hackish, although I can't think of a better way at the
> moment. But I would like not to add more code into module.c if
> possible, and just use a notifier unless there's a real reason we can't
> (as there was when we added the hook in the first place).
> We would double-check it again. It's not a good idea to change common
> module codes.

I double-checked it. If using notifier, we have to add 2 new module notification
events in enum module_state.
For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED.
At MODULE_STATE_INITING, call ftrace_module_init(mod);
At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar function);
At MODULE_STATE_COMING, call a new function which links the new start_pg
to ftrace_pages.
Such design can also fix another bug in current kernel that ftrace start_pg
of the module is not cleaned up if complete_formation fails.
However, we change module common codes. The new events only serve ftrace.

If not holding ftrace_mutex across start_pg setup and complete_formation,
some other variables are not protected well, such like ftrace_pages, 
ftrace_start_up, and so on.

Yanmin
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] smpboot.c: move setup_vector_irq after set_cpu_online

2015-07-02 Thread Zhang, Yanmin
On 2015/7/2 15:02, Xiao, Jin wrote:
> Hi Joerg,
>
> On 7/2/2015 2:52 PM, Joerg Roedel wrote:
>> Hi Jin,
>>
>> On Thu, Jul 02, 2015 at 12:24:34PM +0800, xiao jin wrote:
>>> [  106.107851] BUG: unable to handle kernel NULL pointer dereference at 
>>> 0040
>>> [  106.116702] IP:
>>> [  106.118490]  [] 
>>> check_irq_vectors_for_cpu_disable+0x76/0x180
>> This might be the same issue I fixed with:
>>
>> d97eb8966c91 x86/irq: Check for valid irq descriptor in 
>> check_irq_vectors_for_cpu_disable()
>>
>> Can you try whether applying this patch on your kernel fixes your
>> issue?
>>
> Yes, I agree d97eb8966c91 fix the IPANIC. I just notice a description from 
> http://lkml.kernel.org/r/20150204132754.ga10...@suse.de.
>
> I wish to share the debug result and root cause from my side.

commit d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1 looks like a workaround.

Could Jin's patch be merged also?
free_msi_irqs does have a race with smp_callin=>..=>__setup_vector_irq.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] smpboot.c: move setup_vector_irq after set_cpu_online

2015-07-02 Thread Zhang, Yanmin
On 2015/7/2 15:02, Xiao, Jin wrote:
 Hi Joerg,

 On 7/2/2015 2:52 PM, Joerg Roedel wrote:
 Hi Jin,

 On Thu, Jul 02, 2015 at 12:24:34PM +0800, xiao jin wrote:
 [  106.107851] BUG: unable to handle kernel NULL pointer dereference at 
 0040
 [  106.116702] IP:
 [  106.118490]  [810044f6] 
 check_irq_vectors_for_cpu_disable+0x76/0x180
 This might be the same issue I fixed with:

 d97eb8966c91 x86/irq: Check for valid irq descriptor in 
 check_irq_vectors_for_cpu_disable()

 Can you try whether applying this patch on your kernel fixes your
 issue?

 Yes, I agree d97eb8966c91 fix the IPANIC. I just notice a description from 
 http://lkml.kernel.org/r/20150204132754.ga10...@suse.de.

 I wish to share the debug result and root cause from my side.

commit d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1 looks like a workaround.

Could Jin's patch be merged also?
free_msi_irqs does have a race with smp_callin=..=__setup_vector_irq.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] slub/slab: fix kmemleak didn't work on some case

2015-06-10 Thread Zhang, Yanmin
On 2015/6/9 23:03, Catalin Marinas wrote:
> On Tue, Jun 09, 2015 at 09:10:45AM +0100, Zhang, Yanmin wrote:
>> On 2015/6/8 18:13, Catalin Marinas wrote:
>>> As I replied already, I don't think this is that bad, or at least not
>>> worse than what kmemleak already does (looking at all data whether it's
>>> pointer or not).
>> It depends. As for memleak, developers prefers there are false alarms instead
>> of missing some leaked memory.
> Lots of false positives aren't that nice, you spend a lot of time
> debugging them (I've been there in the early kmemleak days). Anyway,
> your use case is not about false positives vs. negatives but just false
> negatives.
>
> My point is that there is a lot of random, pointer-like data read by
> kmemleak even without this memset (e.g. thread stacks, non-pointer data
> in kmalloc'ed structures, data/bss sections). Just doing this memset may
> reduce the chance of false negatives a bit but I don't think it would be
> noticeable.
>
> If there is some serious memory leak (lots of objects), they would
> likely show up at some point. Even if it's a one-off leak, it's possible
> that it shows up after some time (e.g. the object pointing to this
> memory block is freed).
>
>>>  It also doesn't solve the kmem_cache_alloc() case where
>>> the original object size is no longer available.
>> Such issue around kmem_cache_alloc() case happens only when the
>> caller doesn't initialize or use the full object, so the object keeps
>> old dirty data.
> The kmem_cache blocks size would be aligned to a cache line, so you
> still have some extra bytes never touched by the caller.
>
>> This patch is to resolve the redundant unused space (more than object size)
>> although the full object is used by kernel.
> So this solves only the cases where the original object size is still
> known (e.g. kmalloc). It could also be solved by telling kmemleak the
> actual object size.

Your explanation is reasonable. The patch is for debug purpose.
Maintainers can make decision based on balance.

Xinwu is a new developer in kernel community. Accepting the patch
into kernel can encourage him definitely. :)

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] slub/slab: fix kmemleak didn't work on some case

2015-06-10 Thread Zhang, Yanmin
On 2015/6/9 23:03, Catalin Marinas wrote:
 On Tue, Jun 09, 2015 at 09:10:45AM +0100, Zhang, Yanmin wrote:
 On 2015/6/8 18:13, Catalin Marinas wrote:
 As I replied already, I don't think this is that bad, or at least not
 worse than what kmemleak already does (looking at all data whether it's
 pointer or not).
 It depends. As for memleak, developers prefers there are false alarms instead
 of missing some leaked memory.
 Lots of false positives aren't that nice, you spend a lot of time
 debugging them (I've been there in the early kmemleak days). Anyway,
 your use case is not about false positives vs. negatives but just false
 negatives.

 My point is that there is a lot of random, pointer-like data read by
 kmemleak even without this memset (e.g. thread stacks, non-pointer data
 in kmalloc'ed structures, data/bss sections). Just doing this memset may
 reduce the chance of false negatives a bit but I don't think it would be
 noticeable.

 If there is some serious memory leak (lots of objects), they would
 likely show up at some point. Even if it's a one-off leak, it's possible
 that it shows up after some time (e.g. the object pointing to this
 memory block is freed).

  It also doesn't solve the kmem_cache_alloc() case where
 the original object size is no longer available.
 Such issue around kmem_cache_alloc() case happens only when the
 caller doesn't initialize or use the full object, so the object keeps
 old dirty data.
 The kmem_cache blocks size would be aligned to a cache line, so you
 still have some extra bytes never touched by the caller.

 This patch is to resolve the redundant unused space (more than object size)
 although the full object is used by kernel.
 So this solves only the cases where the original object size is still
 known (e.g. kmalloc). It could also be solved by telling kmemleak the
 actual object size.

Your explanation is reasonable. The patch is for debug purpose.
Maintainers can make decision based on balance.

Xinwu is a new developer in kernel community. Accepting the patch
into kernel can encourage him definitely. :)

Yanmin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] slub/slab: fix kmemleak didn't work on some case

2015-06-09 Thread Zhang, Yanmin
On 2015/6/8 18:13, Catalin Marinas wrote:
> On Mon, Jun 08, 2015 at 10:38:13AM +0100, Christoph Lameter wrote:
>> On Mon, 8 Jun 2015, Liu, XinwuX wrote:
>>
>>> when kernel uses kmalloc to allocate memory, slub/slab will find
>>> a suitable kmem_cache. Ususally the cache's object size is often
>>> greater than requested size. There is unused space which contains
>>> dirty data. These dirty data might have pointers pointing to a block
>> dirty? In what sense?
> I guess XinwuX meant uninitialised.

Uninitialized or dirty data used before being freed.

>
>>> of leaked memory. Kernel wouldn't consider this memory as leaked when
>>> scanning kmemleak object.
>> This has never been considered leaked memory before to my knowledge and
>> the data is already initialized.
> It's not the object being allocated that is considered leaked. But
> uninitialised data in this object is scanned by kmemleak and it may look
> like valid pointers to real leaked objects. So such data increases the
> number of kmemleak false negatives.

Yes, indeed.

>
> As I replied already, I don't think this is that bad, or at least not
> worse than what kmemleak already does (looking at all data whether it's
> pointer or not).

It depends. As for memleak, developers prefers there are false alarms instead
of missing some leaked memory.

>  It also doesn't solve the kmem_cache_alloc() case where
> the original object size is no longer available.

Such issue around kmem_cache_alloc() case happens only when the
caller doesn't initialize or use the full object, so the object keeps
old dirty data.
This patch is to resolve the redundant unused space (more than object size)
although the full object is used by kernel.

>
>> F.e. The zeroing function in linux/mm/slub.c::slab_alloc_node() zeros the
>> complete object and not only the number of bytes specified in the kmalloc
>> call. Same thing is true for SLAB.
> But that's only when __GFP_ZERO is passed.
>
Thanks for the kind comments. There is a balance between performance (new memset
consumes time) and debug capability. 

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] slub/slab: fix kmemleak didn't work on some case

2015-06-09 Thread Zhang, Yanmin
On 2015/6/8 18:13, Catalin Marinas wrote:
 On Mon, Jun 08, 2015 at 10:38:13AM +0100, Christoph Lameter wrote:
 On Mon, 8 Jun 2015, Liu, XinwuX wrote:

 when kernel uses kmalloc to allocate memory, slub/slab will find
 a suitable kmem_cache. Ususally the cache's object size is often
 greater than requested size. There is unused space which contains
 dirty data. These dirty data might have pointers pointing to a block
 dirty? In what sense?
 I guess XinwuX meant uninitialised.

Uninitialized or dirty data used before being freed.


 of leaked memory. Kernel wouldn't consider this memory as leaked when
 scanning kmemleak object.
 This has never been considered leaked memory before to my knowledge and
 the data is already initialized.
 It's not the object being allocated that is considered leaked. But
 uninitialised data in this object is scanned by kmemleak and it may look
 like valid pointers to real leaked objects. So such data increases the
 number of kmemleak false negatives.

Yes, indeed.


 As I replied already, I don't think this is that bad, or at least not
 worse than what kmemleak already does (looking at all data whether it's
 pointer or not).

It depends. As for memleak, developers prefers there are false alarms instead
of missing some leaked memory.

  It also doesn't solve the kmem_cache_alloc() case where
 the original object size is no longer available.

Such issue around kmem_cache_alloc() case happens only when the
caller doesn't initialize or use the full object, so the object keeps
old dirty data.
This patch is to resolve the redundant unused space (more than object size)
although the full object is used by kernel.


 F.e. The zeroing function in linux/mm/slub.c::slab_alloc_node() zeros the
 complete object and not only the number of bytes specified in the kmalloc
 call. Same thing is true for SLAB.
 But that's only when __GFP_ZERO is passed.

Thanks for the kind comments. There is a balance between performance (new memset
consumes time) and debug capability. 

Yanmin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
On 2015/5/27 12:13, Zhang, Yanmin wrote:
> Resend as V1/V2 have email format issue. Sorry for bothering.

Greg,

We have to abandon this patchset. Zhuang Jin Can, a USB expert,
reviewed the patches. He says acm_tty_write already considers
it carefully.

acm_tty_write puts the urb to a delayed queue when acm->susp_count
is not 0. acm_suspend adds 1 to acm->susp_count and acm_resume
decreases 1 from it .

Sorry for the bothering. Also thank Jin Can for the comments.

Yanmin

>
> I use Thunderbird. It has no a button to enable LKML email simply. :)
>
> V3: Change email config to resend.
>   Add a space in comment.
>
>  ---
>
> There is a scenario about cdc-acm utilization.Application opens
> n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device.
> The application configures cdc-adm tty to n_gsm tty as ldisc tty.
>
> n_gsm=>cdc-acm=>xhci driver
>
> acm_tty_write can be called from n_gsm driver by ldisc connection,
> and from application when application opens cdc-acm tty directly.
> acm_tty_write wakes up the device by calling usb_autopm_get_interface_async,
> which calls pm_runtime_get. However, pm_runtime_get can't wake up
> the device before returning as it's an async wake up. Then, acm_tty_write
> might access the device when it is off.
>
> The patchset fixes it by:
> 1) add a new function usb_autopm_get_interface_upgrade to deal with
> above 2 requirements;
> 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 1/3] usb: add function usb_autopm_get_interface_upgrade

2015-05-26 Thread Zhang, Yanmin
From: Zhang Yanmin 

Some usb driver has a specific requirement. Their critical functions
might be called under both atomic environment and non-atomic environment.

If it's under atomic environment, the driver can wake up the device
by calling pm_runtime_get_sync directly.

If it's under non-atomic environment, the function's caller need wake
up the device before the function accesses the device.

The patch adds usb_autopm_get_interface_upgrade, a new function to
support above capability.

Signed-off-by: Zhang Yanmin 
---
 drivers/usb/core/driver.c | 54 +++
 include/linux/usb.h   |  3 +++
 2 files changed, 57 insertions(+)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
index 818369a..5d6f9ee 100644
--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -1684,6 +1684,60 @@ int usb_autopm_get_interface(struct usb_interface *intf)
 EXPORT_SYMBOL_GPL(usb_autopm_get_interface);
 
 /**
+ * usb_autopm_get_interface_upgrade - increment a USB interface's PM-usage
+ * counter
+ * @intf: the usb_interface whose counter should be incremented
+ *
+ * This routine should be called by an interface driver when it wants to
+ * use @intf and needs to guarantee that it is not suspended.  In addition,
+ * the routine prevents @intf from being autosuspended subsequently.  (Note
+ * that this will not prevent suspend events originating in the PM core.)
+ * This prevention will persist until usb_autopm_put_interface() is called
+ * or @intf is unbound.  A typical example would be a character-device
+ * driver when its device file is opened.
+ *
+ * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade
+ * is more careful when resuming the device.
+ * 1) The caller's caller already resumes resume the device and hold spinlocks.
+ * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync;
+ * 2) The caller's caller doesn't resume the device.
+ * usb_autopm_get_interface_upgrade has to resume the device before going 
ahead.
+ *
+ * @intf's usage counter is incremented to prevent subsequent autosuspends.
+ * However if the autoresume fails then the counter is re-decremented.
+ *
+ * This routine can run only in process context.
+ *
+ * Return: 0 on success.
+ */
+int usb_autopm_get_interface_upgrade(struct usb_interface *intf)
+{
+   int status = 0;
+
+   pm_runtime_get(>dev);
+   if (!pm_runtime_active(>dev)) {
+   /* If not active, next _get_sync wakes device up*/
+   status = pm_runtime_get_sync(>dev);
+   /*
+* If it's active, next _put_sync wouldn't
+* really put it to sleep as the 1st _get
+* keeps the device active.
+*/
+   pm_runtime_put_sync(>dev);
+   if (status < 0)
+   pm_runtime_put(>dev);
+   } else
+   atomic_inc(>pm_usage_cnt);
+   dev_vdbg(>dev, "%s: cnt %d -> %d\n",
+   __func__, atomic_read(>dev.power.usage_count),
+   status);
+   if (status > 0)
+   status = 0;
+   return status;
+}
+EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade);
+
+/**
  * usb_autopm_get_interface_async - increment a USB interface's PM-usage 
counter
  * @intf: the usb_interface whose counter should be incremented
  *
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 447fe29..0a8a44a 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev);
 extern void usb_disable_autosuspend(struct usb_device *udev);
 
 extern int usb_autopm_get_interface(struct usb_interface *intf);
+extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf);
 extern void usb_autopm_put_interface(struct usb_interface *intf);
 extern int usb_autopm_get_interface_async(struct usb_interface *intf);
 extern void usb_autopm_put_interface_async(struct usb_interface *intf);
@@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device 
*udev)
 
 static inline int usb_autopm_get_interface(struct usb_interface *intf)
 { return 0; }
+static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf)
+{ return 0; }
 static inline int usb_autopm_get_interface_async(struct usb_interface *intf)
 { return 0; }
 
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 3/3] n_gsm: wake up ldisc tty before using it

2015-05-26 Thread Zhang, Yanmin
From: Zhang Yanmin 

Wake up ldisc device before calling its driver to access the device.

Signed-off-by: Zhang Yanmin 
---
 drivers/tty/n_gsm.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 2c34c32..f887df6 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int debug;
 module_param(debug, int, 0600);
@@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, 
int len)
return olen;
 }
 
+static int pm_runtime_get_sync_tty(struct tty_struct *tty)
+{
+   int ret = 0;
+
+   /* Wakeup parent as tty itself doesn't enable runtime */
+   if (tty->dev->parent)
+   ret = pm_runtime_get_sync(tty->dev->parent);
+
+   return ret;
+}
+
+static int pm_runtime_put_tty(struct tty_struct *tty)
+{
+   int ret = 0;
+
+   if (tty->dev->parent)
+   ret = pm_runtime_put(tty->dev->parent);
+
+   return ret;
+}
+
 /**
  * gsm_send-   send a control frame
  * @gsm: our GSM mux
@@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci)
return;
dlci->retries = gsm->n2;
dlci->state = DLCI_OPENING;
+   pm_runtime_get_sync_tty(gsm->tty);
gsm_command(dlci->gsm, dlci->addr, SABM|PF);
+   pm_runtime_put_tty(gsm->tty);
mod_timer(>t1, jiffies + gsm->t1 * HZ / 100);
 }
 
@@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci)
return;
dlci->retries = gsm->n2;
dlci->state = DLCI_CLOSING;
+   pm_runtime_get_sync_tty(gsm->tty);
gsm_command(dlci->gsm, dlci->addr, DISC|PF);
+   pm_runtime_put_tty(gsm->tty);
mod_timer(>t1, jiffies + gsm->t1 * HZ / 100);
 }
 
@@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, 
const unsigned char *cp,
flags = *f++;
switch (flags) {
case TTY_NORMAL:
+   pm_runtime_get_sync_tty(gsm->tty);
gsm->receive(gsm, *dp);
+   pm_runtime_put_tty(gsm->tty);
break;
case TTY_OVERRUN:
case TTY_BREAK:
@@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)
 {
struct gsm_dlci *dlci = tty->driver_data;
struct tty_port *port = >port;
+   int ret;
 
port->count++;
tty_port_tty_set(port, tty);
@@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)
/* Start sending off SABM messages */
gsm_dlci_begin_open(dlci);
/* And wait for virtual carrier */
-   return tty_port_block_til_ready(port, tty, filp);
+   pm_runtime_get_sync_tty(dlci->gsm->tty);
+   ret = tty_port_block_til_ready(port, tty, filp);
+   pm_runtime_put_tty(dlci->gsm->tty);
+
+   return ret;
 }
 
 static void gsmtty_close(struct tty_struct *tty, struct file *filp)
@@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct 
file *filp)
gsm = dlci->gsm;
if (tty_port_close_start(>port, tty, filp) == 0)
return;
+   pm_runtime_get_sync_tty(gsm->tty);
gsm_dlci_begin_close(dlci);
if (test_bit(ASYNCB_INITIALIZED, >port.flags)) {
if (C_HUPCL(tty))
tty_port_lower_dtr_rts(>port);
}
+   pm_runtime_put_tty(gsm->tty);
+
tty_port_close_end(>port, tty);
tty_port_tty_set(>port, NULL);
return;
@@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const 
unsigned char *buf,
struct gsm_dlci *dlci = tty->driver_data;
if (dlci->state == DLCI_CLOSED)
return -EINVAL;
+   pm_runtime_get_sync_tty(dlci->gsm->tty);
/* Stuff the bytes into the fifo queue */
sent = kfifo_in_locked(dlci->fifo, buf, len, >lock);
/* Need to kick the channel */
gsm_dlci_data_kick(dlci);
+   pm_runtime_put_tty(dlci->gsm->tty);
return sent;
 }
 
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
From: Zhang Yanmin 

acm device might be used as ldisc device by n_gsm driver.
gsmtty_write and other gsm functions calls acm_tty_write
indirectly while they holds spinlocks.

Meanwhile, application might access ACM tty device directly.

Here we choose to call usb_autopm_get_interface_upgrade instead of
usb_autopm_get_interface_async to make sure above 2 scenarios can
work well.

Signed-off-by: Zhang Yanmin 
---
 drivers/usb/class/cdc-acm.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 5c8f581..6ad85a3 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty,
 
dev_vdbg(>data->dev, "%s - count %d\n", __func__, count);
 
+   stat = usb_autopm_get_interface_upgrade(acm->control);
+   if (stat)
+   return stat;
+
spin_lock_irqsave(>write_lock, flags);
wbn = acm_wb_alloc(acm);
if (wbn < 0) {
spin_unlock_irqrestore(>write_lock, flags);
+   usb_autopm_put_interface(acm->control);
return 0;
}
wb = >wb[wbn];
@@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty,
if (!acm->dev) {
wb->use = 0;
spin_unlock_irqrestore(>write_lock, flags);
+   usb_autopm_put_interface(acm->control);
return -ENODEV;
}
 
@@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty,
memcpy(wb->buf, buf, count);
wb->len = count;
 
-   stat = usb_autopm_get_interface_async(acm->control);
-   if (stat) {
-   wb->use = 0;
-   spin_unlock_irqrestore(>write_lock, flags);
-   return stat;
-   }
-
if (acm->susp_count) {
usb_anchor_urb(wb->urb, >delayed);
spin_unlock_irqrestore(>write_lock, flags);
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
Resend as V1/V2 have email format issue. Sorry for bothering.

I use Thunderbird. It has no a button to enable LKML email simply. :)

V3: Change email config to resend.
Add a space in comment.

 ---

There is a scenario about cdc-acm utilization.Application opens
n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device.
The application configures cdc-adm tty to n_gsm tty as ldisc tty.

n_gsm=>cdc-acm=>xhci driver

acm_tty_write can be called from n_gsm driver by ldisc connection,
and from application when application opens cdc-acm tty directly.
acm_tty_write wakes up the device by calling usb_autopm_get_interface_async,
which calls pm_runtime_get. However, pm_runtime_get can't wake up
the device before returning as it's an async wake up. Then, acm_tty_write
might access the device when it is off.

The patchset fixes it by:
1) add a new function usb_autopm_get_interface_upgrade to deal with
above 2 requirements;
2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 3/3] n_gsm: wake up ldisc tty before using it

2015-05-26 Thread Zhang, Yanmin
On 2015/5/27 11:02, Greg KH wrote:
> On Wed, May 27, 2015 at 10:50:01AM +0800, Zhang, Yanmin wrote:
>> Wake up ldisc device before calling its driver to access the device.
>>
>> Signed-off-by: Zhang Yanmin 
>>
>> ---
>>
>>  drivers/tty/n_gsm.c | 40 +++-
>>  1 file changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
>> index 2c34c32..40671fa 100644
>> --- a/drivers/tty/n_gsm.c
>> +++ b/drivers/tty/n_gsm.c
>> @@ -62,6 +62,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  static int debug;
>>  module_param(debug, int, 0600);
>> @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, 
>> int len)
>>  return olen;
>>  }
>>  
>> +static int pm_runtime_get_sync_tty(struct tty_struct *tty)
>> +{
>> +int ret = 0;
>> +
>> +/*Wakeup parent as tty itself doesn't enable runtime*/
> No spaces in your comment?
I will add it.

>
> Anyway, this is corrupted and can't be applied, please fix up your email
> client and try it again...
I check the patch by scripts/checkpatch.pl and everything seems good.

I use Thunderbird 31.7.0, the latest. It auto updates to the latest version. 
Perhaps some new
config options changed something.

It seems email client converts some tab to space automatically.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 3/3] n_gsm: wake up ldisc tty before using it

2015-05-26 Thread Zhang, Yanmin
Wake up ldisc device before calling its driver to access the device.

Signed-off-by: Zhang Yanmin 

---

 drivers/tty/n_gsm.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 2c34c32..40671fa 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int debug;
 module_param(debug, int, 0600);
@@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, 
int len)
 return olen;
 }
 
+static int pm_runtime_get_sync_tty(struct tty_struct *tty)
+{
+int ret = 0;
+
+/*Wakeup parent as tty itself doesn't enable runtime*/
+if (tty->dev->parent)
+ret = pm_runtime_get_sync(tty->dev->parent);
+
+return ret;
+}
+
+static int pm_runtime_put_tty(struct tty_struct *tty)
+{
+int ret = 0;
+
+if (tty->dev->parent)
+ret = pm_runtime_put(tty->dev->parent);
+
+return ret;
+}
+
 /**
  *gsm_send-send a control frame
  *@gsm: our GSM mux
@@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci)
 return;
 dlci->retries = gsm->n2;
 dlci->state = DLCI_OPENING;
+pm_runtime_get_sync_tty(gsm->tty);
 gsm_command(dlci->gsm, dlci->addr, SABM|PF);
+pm_runtime_put_tty(gsm->tty);
 mod_timer(>t1, jiffies + gsm->t1 * HZ / 100);
 }
 
@@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci)
 return;
 dlci->retries = gsm->n2;
 dlci->state = DLCI_CLOSING;
+pm_runtime_get_sync_tty(gsm->tty);
 gsm_command(dlci->gsm, dlci->addr, DISC|PF);
+pm_runtime_put_tty(gsm->tty);
 mod_timer(>t1, jiffies + gsm->t1 * HZ / 100);
 }
 
@@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, 
const unsigned char *cp,
 flags = *f++;
 switch (flags) {
 case TTY_NORMAL:
+pm_runtime_get_sync_tty(gsm->tty);
 gsm->receive(gsm, *dp);
+pm_runtime_put_tty(gsm->tty);
 break;
 case TTY_OVERRUN:
 case TTY_BREAK:
@@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)
 {
 struct gsm_dlci *dlci = tty->driver_data;
 struct tty_port *port = >port;
+int ret;
 
 port->count++;
 tty_port_tty_set(port, tty);
@@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)
 /* Start sending off SABM messages */
 gsm_dlci_begin_open(dlci);
 /* And wait for virtual carrier */
-return tty_port_block_til_ready(port, tty, filp);
+pm_runtime_get_sync_tty(dlci->gsm->tty);
+ret = tty_port_block_til_ready(port, tty, filp);
+pm_runtime_put_tty(dlci->gsm->tty);
+
+return ret;
 }
 
 static void gsmtty_close(struct tty_struct *tty, struct file *filp)
@@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct 
file *filp)
 gsm = dlci->gsm;
 if (tty_port_close_start(>port, tty, filp) == 0)
 return;
+pm_runtime_get_sync_tty(gsm->tty);
 gsm_dlci_begin_close(dlci);
 if (test_bit(ASYNCB_INITIALIZED, >port.flags)) {
 if (C_HUPCL(tty))
 tty_port_lower_dtr_rts(>port);
 }
+pm_runtime_put_tty(gsm->tty);
+
 tty_port_close_end(>port, tty);
 tty_port_tty_set(>port, NULL);
 return;
@@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const 
unsigned char *buf,
 struct gsm_dlci *dlci = tty->driver_data;
 if (dlci->state == DLCI_CLOSED)
 return -EINVAL;
+pm_runtime_get_sync_tty(dlci->gsm->tty);
 /* Stuff the bytes into the fifo queue */
 sent = kfifo_in_locked(dlci->fifo, buf, len, >lock);
 /* Need to kick the channel */
 gsm_dlci_data_kick(dlci);
+pm_runtime_put_tty(dlci->gsm->tty);
 return sent;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
acm device might be used as ldisc device by n_gsm driver.
gsmtty_write and other gsm functions calls acm_tty_write
indirectly while they holds spinlocks.

Meanwhile, application might access ACM tty device directly.

Here we choose to call usb_autopm_get_interface_upgrade instead of
usb_autopm_get_interface_async to make sure above 2 scenarios can
work well.

Signed-off-by: Zhang Yanmin 

---

 drivers/usb/class/cdc-acm.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 5c8f581..6ad85a3 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty,
 
 dev_vdbg(>data->dev, "%s - count %d\n", __func__, count);
 
+stat = usb_autopm_get_interface_upgrade(acm->control);
+if (stat)
+return stat;
+
 spin_lock_irqsave(>write_lock, flags);
 wbn = acm_wb_alloc(acm);
 if (wbn < 0) {
 spin_unlock_irqrestore(>write_lock, flags);
+usb_autopm_put_interface(acm->control);
 return 0;
 }
 wb = >wb[wbn];
@@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty,
 if (!acm->dev) {
 wb->use = 0;
 spin_unlock_irqrestore(>write_lock, flags);
+usb_autopm_put_interface(acm->control);
 return -ENODEV;
 }
 
@@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty,
 memcpy(wb->buf, buf, count);
 wb->len = count;
 
-stat = usb_autopm_get_interface_async(acm->control);
-if (stat) {
-wb->use = 0;
-spin_unlock_irqrestore(>write_lock, flags);
-return stat;
-}
-
 if (acm->susp_count) {
 usb_anchor_urb(wb->urb, >delayed);
 spin_unlock_irqrestore(>write_lock, flags);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 1/3] usb: add function usb_autopm_get_interface_upgrade

2015-05-26 Thread Zhang, Yanmin
Some usb driver has a specific requirement. Their critical functions
might be called under both atomic environment and non-atomic environment.

If it's under atomic environment, the driver can wake up the device
by calling pm_runtime_get_sync directly.

If it's under non-atomic environment, the function's caller need wake
up the device before the function accesses the device.

The patch adds usb_autopm_get_interface_upgrade, a new function to
support above capability.

Signed-off-by: Zhang Yanmin 

---

 drivers/usb/core/driver.c | 54 +++
 include/linux/usb.h   |  3 +++
 2 files changed, 57 insertions(+)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
index 818369a..5d6f9ee 100644
--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -1684,6 +1684,60 @@ int usb_autopm_get_interface(struct usb_interface *intf)
 EXPORT_SYMBOL_GPL(usb_autopm_get_interface);
 
 /**
+ * usb_autopm_get_interface_upgrade - increment a USB interface's PM-usage
+ * counter
+ * @intf: the usb_interface whose counter should be incremented
+ *
+ * This routine should be called by an interface driver when it wants to
+ * use @intf and needs to guarantee that it is not suspended.  In addition,
+ * the routine prevents @intf from being autosuspended subsequently.  (Note
+ * that this will not prevent suspend events originating in the PM core.)
+ * This prevention will persist until usb_autopm_put_interface() is called
+ * or @intf is unbound.  A typical example would be a character-device
+ * driver when its device file is opened.
+ *
+ * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade
+ * is more careful when resuming the device.
+ * 1) The caller's caller already resumes resume the device and hold spinlocks.
+ * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync;
+ * 2) The caller's caller doesn't resume the device.
+ * usb_autopm_get_interface_upgrade has to resume the device before going 
ahead.
+ *
+ * @intf's usage counter is incremented to prevent subsequent autosuspends.
+ * However if the autoresume fails then the counter is re-decremented.
+ *
+ * This routine can run only in process context.
+ *
+ * Return: 0 on success.
+ */
+int usb_autopm_get_interface_upgrade(struct usb_interface *intf)
+{
+intstatus = 0;
+
+pm_runtime_get(>dev);
+if (!pm_runtime_active(>dev)) {
+/* If not active, next _get_sync wakes device up*/
+status = pm_runtime_get_sync(>dev);
+/*
+ * If it's active, next _put_sync wouldn't
+ * really put it to sleep as the 1st _get
+ * keeps the device active.
+ */
+pm_runtime_put_sync(>dev);
+if (status < 0)
+pm_runtime_put(>dev);
+} else
+atomic_inc(>pm_usage_cnt);
+dev_vdbg(>dev, "%s: cnt %d -> %d\n",
+__func__, atomic_read(>dev.power.usage_count),
+status);
+if (status > 0)
+status = 0;
+return status;
+}
+EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade);
+
+/**
  * usb_autopm_get_interface_async - increment a USB interface's PM-usage 
counter
  * @intf: the usb_interface whose counter should be incremented
  *
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 447fe29..0a8a44a 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev);
 extern void usb_disable_autosuspend(struct usb_device *udev);
 
 extern int usb_autopm_get_interface(struct usb_interface *intf);
+extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf);
 extern void usb_autopm_put_interface(struct usb_interface *intf);
 extern int usb_autopm_get_interface_async(struct usb_interface *intf);
 extern void usb_autopm_put_interface_async(struct usb_interface *intf);
@@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device 
*udev)
 
 static inline int usb_autopm_get_interface(struct usb_interface *intf)
 { return 0; }
+static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf)
+{ return 0; }
 static inline int usb_autopm_get_interface_async(struct usb_interface *intf)
 { return 0; }
 
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
Resend as V1 has email format issue. Sorry for bothering.

---

There is a scenario about cdc-acm utilization.Application opens
n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device.
The application configures cdc-adm tty to n_gsm tty as ldisc tty.

n_gsm=>cdc-acm=>xhci driver

acm_tty_write can be called from n_gsm driver by ldisc connection,
and from application when application opens cdc-acm tty directly.
acm_tty_write wakes up the device by calling usb_autopm_get_interface_async,
which calls pm_runtime_get. However, pm_runtime_get can't wake up
the device before returning as it's an async wake up. Then, acm_tty_write
might access the device when it is off.

The patchset fixes it by:
1) add a new function usb_autopm_get_interface_upgrade to deal with
above 2 requirements;
2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] usb: add function usb_autopm_get_interface_upgrade

2015-05-26 Thread Zhang, Yanmin

On 2015/5/26 22:12, Greg Kroah-Hartman wrote:


On Tue, May 26, 2015 at 10:19:57AM +0800, Zhang, Yanmin wrote:

Some usb driver has a specific requirement. Their critical functions

might be called under both atomic environment and non-atomic environment.

If it's under atomic environment, the driver can wake up the device

by calling pm_runtime_get_sync directly.

If it's under non-atomic environment, the function's caller need wake

up the device before the function accesses the device.

The patch adds usb_autopm_get_interface_upgrade, a new function to

support above capability.

Signed-off-by: Zhang Yanmin 

---

  drivers/usb/core/driver.c | 53 +++

  include/linux/usb.h   |  3 +++

  2 files changed, 56 insertions(+)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c

index 818369a..d07fd8d 100644

--- a/drivers/usb/core/driver.c

+++ b/drivers/usb/core/driver.c

@@ -1684,6 +1684,59 @@ int usb_autopm_get_interface(struct usb_interface *intf)

  EXPORT_SYMBOL_GPL(usb_autopm_get_interface);


  /**

+ * usb_autopm_get_interface_retry - increment a USB interface's PM-usage 
counter

+ * @intf: the usb_interface whose counter should be incremented

Your emails are obviously corrupted and can't be applied..  Please fix
up and resend all of them.
Sorry. I didn't send patches for a long time. I would resend the 
patchset again.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] usb: add function usb_autopm_get_interface_upgrade

2015-05-26 Thread Zhang, Yanmin

On 2015/5/26 22:12, Greg Kroah-Hartman wrote:


On Tue, May 26, 2015 at 10:19:57AM +0800, Zhang, Yanmin wrote:

Some usb driver has a specific requirement. Their critical functions

might be called under both atomic environment and non-atomic environment.

If it's under atomic environment, the driver can wake up the device

by calling pm_runtime_get_sync directly.

If it's under non-atomic environment, the function's caller need wake

up the device before the function accesses the device.

The patch adds usb_autopm_get_interface_upgrade, a new function to

support above capability.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com

---

  drivers/usb/core/driver.c | 53 +++

  include/linux/usb.h   |  3 +++

  2 files changed, 56 insertions(+)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c

index 818369a..d07fd8d 100644

--- a/drivers/usb/core/driver.c

+++ b/drivers/usb/core/driver.c

@@ -1684,6 +1684,59 @@ int usb_autopm_get_interface(struct usb_interface *intf)

  EXPORT_SYMBOL_GPL(usb_autopm_get_interface);


  /**

+ * usb_autopm_get_interface_retry - increment a USB interface's PM-usage 
counter

+ * @intf: the usb_interface whose counter should be incremented

Your emails are obviously corrupted and can't be applied..  Please fix
up and resend all of them.
Sorry. I didn't send patches for a long time. I would resend the 
patchset again.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 1/3] usb: add function usb_autopm_get_interface_upgrade

2015-05-26 Thread Zhang, Yanmin
Some usb driver has a specific requirement. Their critical functions
might be called under both atomic environment and non-atomic environment.

If it's under atomic environment, the driver can wake up the device
by calling pm_runtime_get_sync directly.

If it's under non-atomic environment, the function's caller need wake
up the device before the function accesses the device.

The patch adds usb_autopm_get_interface_upgrade, a new function to
support above capability.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com

---

 drivers/usb/core/driver.c | 54 +++
 include/linux/usb.h   |  3 +++
 2 files changed, 57 insertions(+)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
index 818369a..5d6f9ee 100644
--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -1684,6 +1684,60 @@ int usb_autopm_get_interface(struct usb_interface *intf)
 EXPORT_SYMBOL_GPL(usb_autopm_get_interface);
 
 /**
+ * usb_autopm_get_interface_upgrade - increment a USB interface's PM-usage
+ * counter
+ * @intf: the usb_interface whose counter should be incremented
+ *
+ * This routine should be called by an interface driver when it wants to
+ * use @intf and needs to guarantee that it is not suspended.  In addition,
+ * the routine prevents @intf from being autosuspended subsequently.  (Note
+ * that this will not prevent suspend events originating in the PM core.)
+ * This prevention will persist until usb_autopm_put_interface() is called
+ * or @intf is unbound.  A typical example would be a character-device
+ * driver when its device file is opened.
+ *
+ * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade
+ * is more careful when resuming the device.
+ * 1) The caller's caller already resumes resume the device and hold spinlocks.
+ * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync;
+ * 2) The caller's caller doesn't resume the device.
+ * usb_autopm_get_interface_upgrade has to resume the device before going 
ahead.
+ *
+ * @intf's usage counter is incremented to prevent subsequent autosuspends.
+ * However if the autoresume fails then the counter is re-decremented.
+ *
+ * This routine can run only in process context.
+ *
+ * Return: 0 on success.
+ */
+int usb_autopm_get_interface_upgrade(struct usb_interface *intf)
+{
+intstatus = 0;
+
+pm_runtime_get(intf-dev);
+if (!pm_runtime_active(intf-dev)) {
+/* If not active, next _get_sync wakes device up*/
+status = pm_runtime_get_sync(intf-dev);
+/*
+ * If it's active, next _put_sync wouldn't
+ * really put it to sleep as the 1st _get
+ * keeps the device active.
+ */
+pm_runtime_put_sync(intf-dev);
+if (status  0)
+pm_runtime_put(intf-dev);
+} else
+atomic_inc(intf-pm_usage_cnt);
+dev_vdbg(intf-dev, %s: cnt %d - %d\n,
+__func__, atomic_read(intf-dev.power.usage_count),
+status);
+if (status  0)
+status = 0;
+return status;
+}
+EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade);
+
+/**
  * usb_autopm_get_interface_async - increment a USB interface's PM-usage 
counter
  * @intf: the usb_interface whose counter should be incremented
  *
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 447fe29..0a8a44a 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev);
 extern void usb_disable_autosuspend(struct usb_device *udev);
 
 extern int usb_autopm_get_interface(struct usb_interface *intf);
+extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf);
 extern void usb_autopm_put_interface(struct usb_interface *intf);
 extern int usb_autopm_get_interface_async(struct usb_interface *intf);
 extern void usb_autopm_put_interface_async(struct usb_interface *intf);
@@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device 
*udev)
 
 static inline int usb_autopm_get_interface(struct usb_interface *intf)
 { return 0; }
+static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf)
+{ return 0; }
 static inline int usb_autopm_get_interface_async(struct usb_interface *intf)
 { return 0; }
 
-- 
1.9.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
Resend as V1 has email format issue. Sorry for bothering.

---

There is a scenario about cdc-acm utilization.Application opens
n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device.
The application configures cdc-adm tty to n_gsm tty as ldisc tty.

n_gsm=cdc-acm=xhci driver

acm_tty_write can be called from n_gsm driver by ldisc connection,
and from application when application opens cdc-acm tty directly.
acm_tty_write wakes up the device by calling usb_autopm_get_interface_async,
which calls pm_runtime_get. However, pm_runtime_get can't wake up
the device before returning as it's an async wake up. Then, acm_tty_write
might access the device when it is off.

The patchset fixes it by:
1) add a new function usb_autopm_get_interface_upgrade to deal with
above 2 requirements;
2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 3/3] n_gsm: wake up ldisc tty before using it

2015-05-26 Thread Zhang, Yanmin
Wake up ldisc device before calling its driver to access the device.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com

---

 drivers/tty/n_gsm.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 2c34c32..40671fa 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -62,6 +62,7 @@
 #include linux/netdevice.h
 #include linux/etherdevice.h
 #include linux/gsmmux.h
+#include linux/pm_runtime.h
 
 static int debug;
 module_param(debug, int, 0600);
@@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, 
int len)
 return olen;
 }
 
+static int pm_runtime_get_sync_tty(struct tty_struct *tty)
+{
+int ret = 0;
+
+/*Wakeup parent as tty itself doesn't enable runtime*/
+if (tty-dev-parent)
+ret = pm_runtime_get_sync(tty-dev-parent);
+
+return ret;
+}
+
+static int pm_runtime_put_tty(struct tty_struct *tty)
+{
+int ret = 0;
+
+if (tty-dev-parent)
+ret = pm_runtime_put(tty-dev-parent);
+
+return ret;
+}
+
 /**
  *gsm_send-send a control frame
  *@gsm: our GSM mux
@@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci)
 return;
 dlci-retries = gsm-n2;
 dlci-state = DLCI_OPENING;
+pm_runtime_get_sync_tty(gsm-tty);
 gsm_command(dlci-gsm, dlci-addr, SABM|PF);
+pm_runtime_put_tty(gsm-tty);
 mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100);
 }
 
@@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci)
 return;
 dlci-retries = gsm-n2;
 dlci-state = DLCI_CLOSING;
+pm_runtime_get_sync_tty(gsm-tty);
 gsm_command(dlci-gsm, dlci-addr, DISC|PF);
+pm_runtime_put_tty(gsm-tty);
 mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100);
 }
 
@@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, 
const unsigned char *cp,
 flags = *f++;
 switch (flags) {
 case TTY_NORMAL:
+pm_runtime_get_sync_tty(gsm-tty);
 gsm-receive(gsm, *dp);
+pm_runtime_put_tty(gsm-tty);
 break;
 case TTY_OVERRUN:
 case TTY_BREAK:
@@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)
 {
 struct gsm_dlci *dlci = tty-driver_data;
 struct tty_port *port = dlci-port;
+int ret;
 
 port-count++;
 tty_port_tty_set(port, tty);
@@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)
 /* Start sending off SABM messages */
 gsm_dlci_begin_open(dlci);
 /* And wait for virtual carrier */
-return tty_port_block_til_ready(port, tty, filp);
+pm_runtime_get_sync_tty(dlci-gsm-tty);
+ret = tty_port_block_til_ready(port, tty, filp);
+pm_runtime_put_tty(dlci-gsm-tty);
+
+return ret;
 }
 
 static void gsmtty_close(struct tty_struct *tty, struct file *filp)
@@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct 
file *filp)
 gsm = dlci-gsm;
 if (tty_port_close_start(dlci-port, tty, filp) == 0)
 return;
+pm_runtime_get_sync_tty(gsm-tty);
 gsm_dlci_begin_close(dlci);
 if (test_bit(ASYNCB_INITIALIZED, dlci-port.flags)) {
 if (C_HUPCL(tty))
 tty_port_lower_dtr_rts(dlci-port);
 }
+pm_runtime_put_tty(gsm-tty);
+
 tty_port_close_end(dlci-port, tty);
 tty_port_tty_set(dlci-port, NULL);
 return;
@@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const 
unsigned char *buf,
 struct gsm_dlci *dlci = tty-driver_data;
 if (dlci-state == DLCI_CLOSED)
 return -EINVAL;
+pm_runtime_get_sync_tty(dlci-gsm-tty);
 /* Stuff the bytes into the fifo queue */
 sent = kfifo_in_locked(dlci-fifo, buf, len, dlci-lock);
 /* Need to kick the channel */
 gsm_dlci_data_kick(dlci);
+pm_runtime_put_tty(dlci-gsm-tty);
 return sent;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
acm device might be used as ldisc device by n_gsm driver.
gsmtty_write and other gsm functions calls acm_tty_write
indirectly while they holds spinlocks.

Meanwhile, application might access ACM tty device directly.

Here we choose to call usb_autopm_get_interface_upgrade instead of
usb_autopm_get_interface_async to make sure above 2 scenarios can
work well.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com

---

 drivers/usb/class/cdc-acm.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 5c8f581..6ad85a3 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty,
 
 dev_vdbg(acm-data-dev, %s - count %d\n, __func__, count);
 
+stat = usb_autopm_get_interface_upgrade(acm-control);
+if (stat)
+return stat;
+
 spin_lock_irqsave(acm-write_lock, flags);
 wbn = acm_wb_alloc(acm);
 if (wbn  0) {
 spin_unlock_irqrestore(acm-write_lock, flags);
+usb_autopm_put_interface(acm-control);
 return 0;
 }
 wb = acm-wb[wbn];
@@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty,
 if (!acm-dev) {
 wb-use = 0;
 spin_unlock_irqrestore(acm-write_lock, flags);
+usb_autopm_put_interface(acm-control);
 return -ENODEV;
 }
 
@@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty,
 memcpy(wb-buf, buf, count);
 wb-len = count;
 
-stat = usb_autopm_get_interface_async(acm-control);
-if (stat) {
-wb-use = 0;
-spin_unlock_irqrestore(acm-write_lock, flags);
-return stat;
-}
-
 if (acm-susp_count) {
 usb_anchor_urb(wb-urb, acm-delayed);
 spin_unlock_irqrestore(acm-write_lock, flags);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 3/3] n_gsm: wake up ldisc tty before using it

2015-05-26 Thread Zhang, Yanmin
On 2015/5/27 11:02, Greg KH wrote:
 On Wed, May 27, 2015 at 10:50:01AM +0800, Zhang, Yanmin wrote:
 Wake up ldisc device before calling its driver to access the device.

 Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com

 ---

  drivers/tty/n_gsm.c | 40 +++-
  1 file changed, 39 insertions(+), 1 deletion(-)

 diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
 index 2c34c32..40671fa 100644
 --- a/drivers/tty/n_gsm.c
 +++ b/drivers/tty/n_gsm.c
 @@ -62,6 +62,7 @@
  #include linux/netdevice.h
  #include linux/etherdevice.h
  #include linux/gsmmux.h
 +#include linux/pm_runtime.h
  
  static int debug;
  module_param(debug, int, 0600);
 @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, 
 int len)
  return olen;
  }
  
 +static int pm_runtime_get_sync_tty(struct tty_struct *tty)
 +{
 +int ret = 0;
 +
 +/*Wakeup parent as tty itself doesn't enable runtime*/
 No spaces in your comment?
I will add it.


 Anyway, this is corrupted and can't be applied, please fix up your email
 client and try it again...
I check the patch by scripts/checkpatch.pl and everything seems good.

I use Thunderbird 31.7.0, the latest. It auto updates to the latest version. 
Perhaps some new
config options changed something.

It seems email client converts some tab to space automatically.

Yanmin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 1/3] usb: add function usb_autopm_get_interface_upgrade

2015-05-26 Thread Zhang, Yanmin
From: Zhang Yanmin yanmin.zh...@intel.com

Some usb driver has a specific requirement. Their critical functions
might be called under both atomic environment and non-atomic environment.

If it's under atomic environment, the driver can wake up the device
by calling pm_runtime_get_sync directly.

If it's under non-atomic environment, the function's caller need wake
up the device before the function accesses the device.

The patch adds usb_autopm_get_interface_upgrade, a new function to
support above capability.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com
---
 drivers/usb/core/driver.c | 54 +++
 include/linux/usb.h   |  3 +++
 2 files changed, 57 insertions(+)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
index 818369a..5d6f9ee 100644
--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -1684,6 +1684,60 @@ int usb_autopm_get_interface(struct usb_interface *intf)
 EXPORT_SYMBOL_GPL(usb_autopm_get_interface);
 
 /**
+ * usb_autopm_get_interface_upgrade - increment a USB interface's PM-usage
+ * counter
+ * @intf: the usb_interface whose counter should be incremented
+ *
+ * This routine should be called by an interface driver when it wants to
+ * use @intf and needs to guarantee that it is not suspended.  In addition,
+ * the routine prevents @intf from being autosuspended subsequently.  (Note
+ * that this will not prevent suspend events originating in the PM core.)
+ * This prevention will persist until usb_autopm_put_interface() is called
+ * or @intf is unbound.  A typical example would be a character-device
+ * driver when its device file is opened.
+ *
+ * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade
+ * is more careful when resuming the device.
+ * 1) The caller's caller already resumes resume the device and hold spinlocks.
+ * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync;
+ * 2) The caller's caller doesn't resume the device.
+ * usb_autopm_get_interface_upgrade has to resume the device before going 
ahead.
+ *
+ * @intf's usage counter is incremented to prevent subsequent autosuspends.
+ * However if the autoresume fails then the counter is re-decremented.
+ *
+ * This routine can run only in process context.
+ *
+ * Return: 0 on success.
+ */
+int usb_autopm_get_interface_upgrade(struct usb_interface *intf)
+{
+   int status = 0;
+
+   pm_runtime_get(intf-dev);
+   if (!pm_runtime_active(intf-dev)) {
+   /* If not active, next _get_sync wakes device up*/
+   status = pm_runtime_get_sync(intf-dev);
+   /*
+* If it's active, next _put_sync wouldn't
+* really put it to sleep as the 1st _get
+* keeps the device active.
+*/
+   pm_runtime_put_sync(intf-dev);
+   if (status  0)
+   pm_runtime_put(intf-dev);
+   } else
+   atomic_inc(intf-pm_usage_cnt);
+   dev_vdbg(intf-dev, %s: cnt %d - %d\n,
+   __func__, atomic_read(intf-dev.power.usage_count),
+   status);
+   if (status  0)
+   status = 0;
+   return status;
+}
+EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade);
+
+/**
  * usb_autopm_get_interface_async - increment a USB interface's PM-usage 
counter
  * @intf: the usb_interface whose counter should be incremented
  *
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 447fe29..0a8a44a 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev);
 extern void usb_disable_autosuspend(struct usb_device *udev);
 
 extern int usb_autopm_get_interface(struct usb_interface *intf);
+extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf);
 extern void usb_autopm_put_interface(struct usb_interface *intf);
 extern int usb_autopm_get_interface_async(struct usb_interface *intf);
 extern void usb_autopm_put_interface_async(struct usb_interface *intf);
@@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device 
*udev)
 
 static inline int usb_autopm_get_interface(struct usb_interface *intf)
 { return 0; }
+static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf)
+{ return 0; }
 static inline int usb_autopm_get_interface_async(struct usb_interface *intf)
 { return 0; }
 
-- 
1.9.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 3/3] n_gsm: wake up ldisc tty before using it

2015-05-26 Thread Zhang, Yanmin
From: Zhang Yanmin yanmin.zh...@intel.com

Wake up ldisc device before calling its driver to access the device.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com
---
 drivers/tty/n_gsm.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 2c34c32..f887df6 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -62,6 +62,7 @@
 #include linux/netdevice.h
 #include linux/etherdevice.h
 #include linux/gsmmux.h
+#include linux/pm_runtime.h
 
 static int debug;
 module_param(debug, int, 0600);
@@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, 
int len)
return olen;
 }
 
+static int pm_runtime_get_sync_tty(struct tty_struct *tty)
+{
+   int ret = 0;
+
+   /* Wakeup parent as tty itself doesn't enable runtime */
+   if (tty-dev-parent)
+   ret = pm_runtime_get_sync(tty-dev-parent);
+
+   return ret;
+}
+
+static int pm_runtime_put_tty(struct tty_struct *tty)
+{
+   int ret = 0;
+
+   if (tty-dev-parent)
+   ret = pm_runtime_put(tty-dev-parent);
+
+   return ret;
+}
+
 /**
  * gsm_send-   send a control frame
  * @gsm: our GSM mux
@@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci)
return;
dlci-retries = gsm-n2;
dlci-state = DLCI_OPENING;
+   pm_runtime_get_sync_tty(gsm-tty);
gsm_command(dlci-gsm, dlci-addr, SABM|PF);
+   pm_runtime_put_tty(gsm-tty);
mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100);
 }
 
@@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci)
return;
dlci-retries = gsm-n2;
dlci-state = DLCI_CLOSING;
+   pm_runtime_get_sync_tty(gsm-tty);
gsm_command(dlci-gsm, dlci-addr, DISC|PF);
+   pm_runtime_put_tty(gsm-tty);
mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100);
 }
 
@@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, 
const unsigned char *cp,
flags = *f++;
switch (flags) {
case TTY_NORMAL:
+   pm_runtime_get_sync_tty(gsm-tty);
gsm-receive(gsm, *dp);
+   pm_runtime_put_tty(gsm-tty);
break;
case TTY_OVERRUN:
case TTY_BREAK:
@@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)
 {
struct gsm_dlci *dlci = tty-driver_data;
struct tty_port *port = dlci-port;
+   int ret;
 
port-count++;
tty_port_tty_set(port, tty);
@@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)
/* Start sending off SABM messages */
gsm_dlci_begin_open(dlci);
/* And wait for virtual carrier */
-   return tty_port_block_til_ready(port, tty, filp);
+   pm_runtime_get_sync_tty(dlci-gsm-tty);
+   ret = tty_port_block_til_ready(port, tty, filp);
+   pm_runtime_put_tty(dlci-gsm-tty);
+
+   return ret;
 }
 
 static void gsmtty_close(struct tty_struct *tty, struct file *filp)
@@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct 
file *filp)
gsm = dlci-gsm;
if (tty_port_close_start(dlci-port, tty, filp) == 0)
return;
+   pm_runtime_get_sync_tty(gsm-tty);
gsm_dlci_begin_close(dlci);
if (test_bit(ASYNCB_INITIALIZED, dlci-port.flags)) {
if (C_HUPCL(tty))
tty_port_lower_dtr_rts(dlci-port);
}
+   pm_runtime_put_tty(gsm-tty);
+
tty_port_close_end(dlci-port, tty);
tty_port_tty_set(dlci-port, NULL);
return;
@@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const 
unsigned char *buf,
struct gsm_dlci *dlci = tty-driver_data;
if (dlci-state == DLCI_CLOSED)
return -EINVAL;
+   pm_runtime_get_sync_tty(dlci-gsm-tty);
/* Stuff the bytes into the fifo queue */
sent = kfifo_in_locked(dlci-fifo, buf, len, dlci-lock);
/* Need to kick the channel */
gsm_dlci_data_kick(dlci);
+   pm_runtime_put_tty(dlci-gsm-tty);
return sent;
 }
 
-- 
1.9.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
From: Zhang Yanmin yanmin.zh...@intel.com

acm device might be used as ldisc device by n_gsm driver.
gsmtty_write and other gsm functions calls acm_tty_write
indirectly while they holds spinlocks.

Meanwhile, application might access ACM tty device directly.

Here we choose to call usb_autopm_get_interface_upgrade instead of
usb_autopm_get_interface_async to make sure above 2 scenarios can
work well.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com
---
 drivers/usb/class/cdc-acm.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 5c8f581..6ad85a3 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty,
 
dev_vdbg(acm-data-dev, %s - count %d\n, __func__, count);
 
+   stat = usb_autopm_get_interface_upgrade(acm-control);
+   if (stat)
+   return stat;
+
spin_lock_irqsave(acm-write_lock, flags);
wbn = acm_wb_alloc(acm);
if (wbn  0) {
spin_unlock_irqrestore(acm-write_lock, flags);
+   usb_autopm_put_interface(acm-control);
return 0;
}
wb = acm-wb[wbn];
@@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty,
if (!acm-dev) {
wb-use = 0;
spin_unlock_irqrestore(acm-write_lock, flags);
+   usb_autopm_put_interface(acm-control);
return -ENODEV;
}
 
@@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty,
memcpy(wb-buf, buf, count);
wb-len = count;
 
-   stat = usb_autopm_get_interface_async(acm-control);
-   if (stat) {
-   wb-use = 0;
-   spin_unlock_irqrestore(acm-write_lock, flags);
-   return stat;
-   }
-
if (acm-susp_count) {
usb_anchor_urb(wb-urb, acm-delayed);
spin_unlock_irqrestore(acm-write_lock, flags);
-- 
1.9.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
Resend as V1/V2 have email format issue. Sorry for bothering.

I use Thunderbird. It has no a button to enable LKML email simply. :)

V3: Change email config to resend.
Add a space in comment.

 ---

There is a scenario about cdc-acm utilization.Application opens
n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device.
The application configures cdc-adm tty to n_gsm tty as ldisc tty.

n_gsm=cdc-acm=xhci driver

acm_tty_write can be called from n_gsm driver by ldisc connection,
and from application when application opens cdc-acm tty directly.
acm_tty_write wakes up the device by calling usb_autopm_get_interface_async,
which calls pm_runtime_get. However, pm_runtime_get can't wake up
the device before returning as it's an async wake up. Then, acm_tty_write
might access the device when it is off.

The patchset fixes it by:
1) add a new function usb_autopm_get_interface_upgrade to deal with
above 2 requirements;
2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver;

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write

2015-05-26 Thread Zhang, Yanmin
On 2015/5/27 12:13, Zhang, Yanmin wrote:
 Resend as V1/V2 have email format issue. Sorry for bothering.

Greg,

We have to abandon this patchset. Zhuang Jin Can, a USB expert,
reviewed the patches. He says acm_tty_write already considers
it carefully.

acm_tty_write puts the urb to a delayed queue when acm-susp_count
is not 0. acm_suspend adds 1 to acm-susp_count and acm_resume
decreases 1 from it .

Sorry for the bothering. Also thank Jin Can for the comments.

Yanmin


 I use Thunderbird. It has no a button to enable LKML email simply. :)

 V3: Change email config to resend.
   Add a space in comment.

  ---

 There is a scenario about cdc-acm utilization.Application opens
 n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device.
 The application configures cdc-adm tty to n_gsm tty as ldisc tty.

 n_gsm=cdc-acm=xhci driver

 acm_tty_write can be called from n_gsm driver by ldisc connection,
 and from application when application opens cdc-acm tty directly.
 acm_tty_write wakes up the device by calling usb_autopm_get_interface_async,
 which calls pm_runtime_get. However, pm_runtime_get can't wake up
 the device before returning as it's an async wake up. Then, acm_tty_write
 might access the device when it is off.

 The patchset fixes it by:
 1) add a new function usb_autopm_get_interface_upgrade to deal with
 above 2 requirements;
 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver;

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] n_gsm: wake up ldisc tty before using it

2015-05-25 Thread Zhang, Yanmin

Wake up ldisc device before calling its driver to access the device.

Signed-off-by: Zhang Yanmin 

---

 drivers/tty/n_gsm.c | 40 +++-

 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c

index 2c34c32..40671fa 100644

--- a/drivers/tty/n_gsm.c

+++ b/drivers/tty/n_gsm.c

@@ -62,6 +62,7 @@

 #include 

 #include 

 #include 

+#include 

 


 static int debug;

 module_param(debug, int, 0600);

@@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, 
int len)

 return olen;

 }

 


+static int pm_runtime_get_sync_tty(struct tty_struct *tty)

+{

+int ret = 0;

+

+/*Wakeup parent as tty itself doesn't enable runtime*/

+if (tty->dev->parent)

+ret = pm_runtime_get_sync(tty->dev->parent);

+

+return ret;

+}

+

+static int pm_runtime_put_tty(struct tty_struct *tty)

+{

+int ret = 0;

+

+if (tty->dev->parent)

+ret = pm_runtime_put(tty->dev->parent);

+

+return ret;

+}

+

 /**

  *gsm_send-send a control frame

  *@gsm: our GSM mux

@@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci)

 return;

 dlci->retries = gsm->n2;

 dlci->state = DLCI_OPENING;

+pm_runtime_get_sync_tty(gsm->tty);

 gsm_command(dlci->gsm, dlci->addr, SABM|PF);

+pm_runtime_put_tty(gsm->tty);

 mod_timer(>t1, jiffies + gsm->t1 * HZ / 100);

 }

 


@@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci)

 return;

 dlci->retries = gsm->n2;

 dlci->state = DLCI_CLOSING;

+pm_runtime_get_sync_tty(gsm->tty);

 gsm_command(dlci->gsm, dlci->addr, DISC|PF);

+pm_runtime_put_tty(gsm->tty);

 mod_timer(>t1, jiffies + gsm->t1 * HZ / 100);

 }

 


@@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, 
const unsigned char *cp,

 flags = *f++;

 switch (flags) {

 case TTY_NORMAL:

+pm_runtime_get_sync_tty(gsm->tty);

 gsm->receive(gsm, *dp);

+pm_runtime_put_tty(gsm->tty);

 break;

 case TTY_OVERRUN:

 case TTY_BREAK:

@@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)

 {

 struct gsm_dlci *dlci = tty->driver_data;

 struct tty_port *port = >port;

+int ret;

 


 port->count++;

 tty_port_tty_set(port, tty);

@@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)

 /* Start sending off SABM messages */

 gsm_dlci_begin_open(dlci);

 /* And wait for virtual carrier */

-return tty_port_block_til_ready(port, tty, filp);

+pm_runtime_get_sync_tty(dlci->gsm->tty);

+ret = tty_port_block_til_ready(port, tty, filp);

+pm_runtime_put_tty(dlci->gsm->tty);

+

+return ret;

 }

 


 static void gsmtty_close(struct tty_struct *tty, struct file *filp)

@@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct 
file *filp)

 gsm = dlci->gsm;

 if (tty_port_close_start(>port, tty, filp) == 0)

 return;

+pm_runtime_get_sync_tty(gsm->tty);

 gsm_dlci_begin_close(dlci);

 if (test_bit(ASYNCB_INITIALIZED, >port.flags)) {

 if (C_HUPCL(tty))

 tty_port_lower_dtr_rts(>port);

 }

+pm_runtime_put_tty(gsm->tty);

+

 tty_port_close_end(>port, tty);

 tty_port_tty_set(>port, NULL);

 return;

@@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const 
unsigned char *buf,

 struct gsm_dlci *dlci = tty->driver_data;

 if (dlci->state == DLCI_CLOSED)

 return -EINVAL;

+pm_runtime_get_sync_tty(dlci->gsm->tty);

 /* Stuff the bytes into the fifo queue */

 sent = kfifo_in_locked(dlci->fifo, buf, len, >lock);

 /* Need to kick the channel */

 gsm_dlci_data_kick(dlci);

+pm_runtime_put_tty(dlci->gsm->tty);

 return sent;

 }

 


--

1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in, acm_tty_write

2015-05-25 Thread Zhang, Yanmin

acm device might be used as ldisc device by n_gsm driver.

gsmtty_write and other gsm functions calls acm_tty_write

indirectly while they holds spinlocks.

Meanwhile, application might access ACM tty device directly.

Here we choose to call usb_autopm_get_interface_upgrade instead of

usb_autopm_get_interface_async to make sure above 2 scenarios can

work well.

Signed-off-by: Zhang Yanmin 

---

---
 drivers/usb/class/cdc-acm.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 5c8f581..6ad85a3 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty,
 
 	dev_vdbg(>data->dev, "%s - count %d\n", __func__, count);
 
+	stat = usb_autopm_get_interface_upgrade(acm->control);

+   if (stat)
+   return stat;
+
spin_lock_irqsave(>write_lock, flags);
wbn = acm_wb_alloc(acm);
if (wbn < 0) {
spin_unlock_irqrestore(>write_lock, flags);
+   usb_autopm_put_interface(acm->control);
return 0;
}
wb = >wb[wbn];
@@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty,
if (!acm->dev) {
wb->use = 0;
spin_unlock_irqrestore(>write_lock, flags);
+   usb_autopm_put_interface(acm->control);
return -ENODEV;
}
 
@@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty,

memcpy(wb->buf, buf, count);
wb->len = count;
 
-	stat = usb_autopm_get_interface_async(acm->control);

-   if (stat) {
-   wb->use = 0;
-   spin_unlock_irqrestore(>write_lock, flags);
-   return stat;
-   }
-
if (acm->susp_count) {
usb_anchor_urb(wb->urb, >delayed);
spin_unlock_irqrestore(>write_lock, flags);
--
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] usb: add function usb_autopm_get_interface_upgrade

2015-05-25 Thread Zhang, Yanmin

Some usb driver has a specific requirement. Their critical functions

might be called under both atomic environment and non-atomic environment.

If it's under atomic environment, the driver can wake up the device

by calling pm_runtime_get_sync directly.

If it's under non-atomic environment, the function's caller need wake

up the device before the function accesses the device.

The patch adds usb_autopm_get_interface_upgrade, a new function to

support above capability.

Signed-off-by: Zhang Yanmin 

---

 drivers/usb/core/driver.c | 53 +++

 include/linux/usb.h   |  3 +++

 2 files changed, 56 insertions(+)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c

index 818369a..d07fd8d 100644

--- a/drivers/usb/core/driver.c

+++ b/drivers/usb/core/driver.c

@@ -1684,6 +1684,59 @@ int usb_autopm_get_interface(struct usb_interface *intf)

 EXPORT_SYMBOL_GPL(usb_autopm_get_interface);

 


 /**

+ * usb_autopm_get_interface_retry - increment a USB interface's PM-usage 
counter

+ * @intf: the usb_interface whose counter should be incremented

+ *

+ * This routine should be called by an interface driver when it wants to

+ * use @intf and needs to guarantee that it is not suspended.  In addition,

+ * the routine prevents @intf from being autosuspended subsequently.  (Note

+ * that this will not prevent suspend events originating in the PM core.)

+ * This prevention will persist until usb_autopm_put_interface() is called

+ * or @intf is unbound.  A typical example would be a character-device

+ * driver when its device file is opened.

+ *

+ * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade

+ * is more careful when resuming the device.

+ * 1) The caller's caller already resumes resume the device and hold spinlocks.

+ * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync;

+ * 2) The caller's caller doesn't resume the device.

+ * usb_autopm_get_interface_upgrade has to resume the device before going 
ahead.

+ *

+ * @intf's usage counter is incremented to prevent subsequent autosuspends.

+ * However if the autoresume fails then the counter is re-decremented.

+ *

+ * This routine can run only in process context.

+ *

+ * Return: 0 on success.

+ */

+int usb_autopm_get_interface_upgrade(struct usb_interface *intf)

+{

+intstatus = 0;

+

+pm_runtime_get(>dev);

+if (!pm_runtime_active(>dev)) {

+/* If not active, next _get_sync wakes device up*/

+status = pm_runtime_get_sync(>dev);

+/*

+ * If it's active, next _put_sync wouldn't

+ * really put it to sleep as the 1st _get

+ * keeps the device active.

+ */

+pm_runtime_put_sync(>dev);

+if (status < 0)

+pm_runtime_put(>dev);

+} else

+atomic_inc(>pm_usage_cnt);

+dev_vdbg(>dev, "%s: cnt %d -> %d\n",

+__func__, atomic_read(>dev.power.usage_count),

+status);

+if (status > 0)

+status = 0;

+return status;

+}

+EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade);

+

+/**

  * usb_autopm_get_interface_async - increment a USB interface's PM-usage 
counter

  * @intf: the usb_interface whose counter should be incremented

  *

diff --git a/include/linux/usb.h b/include/linux/usb.h

index 447fe29..0a8a44a 100644

--- a/include/linux/usb.h

+++ b/include/linux/usb.h

@@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev);

 extern void usb_disable_autosuspend(struct usb_device *udev);

 


 extern int usb_autopm_get_interface(struct usb_interface *intf);

+extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf);

 extern void usb_autopm_put_interface(struct usb_interface *intf);

 extern int usb_autopm_get_interface_async(struct usb_interface *intf);

 extern void usb_autopm_put_interface_async(struct usb_interface *intf);

@@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device 
*udev)

 


 static inline int usb_autopm_get_interface(struct usb_interface *intf)

 { return 0; }

+static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf)

+{ return 0; }

 static inline int usb_autopm_get_interface_async(struct usb_interface *intf)

 { return 0; }

 


--

1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write

2015-05-25 Thread Zhang, Yanmin

There is a scenario about cdc-acm utilization.Application opens

n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device.

The application configures cdc-adm tty to n_gsm tty as ldisc tty.

n_gsm=>cdc-acm=>xhci driver

acm_tty_write can be called from n_gsm driver by ldisc connection,

and from application when application opens cdc-acm tty directly.

acm_tty_write wakes up the device by calling usb_autopm_get_interface_async,

which calls pm_runtime_get. However, pm_runtime_get can't wake up

the device before returning as it's an async wake up. Then, acm_tty_write

might access the device when it is off.

The patchset fixes it by:

1) add a new function usb_autopm_get_interface_upgrade to deal with

above 2 requirements;

2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] n_gsm: wake up ldisc tty before using it

2015-05-25 Thread Zhang, Yanmin

Wake up ldisc device before calling its driver to access the device.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com

---

 drivers/tty/n_gsm.c | 40 +++-

 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c

index 2c34c32..40671fa 100644

--- a/drivers/tty/n_gsm.c

+++ b/drivers/tty/n_gsm.c

@@ -62,6 +62,7 @@

 #include linux/netdevice.h

 #include linux/etherdevice.h

 #include linux/gsmmux.h

+#include linux/pm_runtime.h

 


 static int debug;

 module_param(debug, int, 0600);

@@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, 
int len)

 return olen;

 }

 


+static int pm_runtime_get_sync_tty(struct tty_struct *tty)

+{

+int ret = 0;

+

+/*Wakeup parent as tty itself doesn't enable runtime*/

+if (tty-dev-parent)

+ret = pm_runtime_get_sync(tty-dev-parent);

+

+return ret;

+}

+

+static int pm_runtime_put_tty(struct tty_struct *tty)

+{

+int ret = 0;

+

+if (tty-dev-parent)

+ret = pm_runtime_put(tty-dev-parent);

+

+return ret;

+}

+

 /**

  *gsm_send-send a control frame

  *@gsm: our GSM mux

@@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci)

 return;

 dlci-retries = gsm-n2;

 dlci-state = DLCI_OPENING;

+pm_runtime_get_sync_tty(gsm-tty);

 gsm_command(dlci-gsm, dlci-addr, SABM|PF);

+pm_runtime_put_tty(gsm-tty);

 mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100);

 }

 


@@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci)

 return;

 dlci-retries = gsm-n2;

 dlci-state = DLCI_CLOSING;

+pm_runtime_get_sync_tty(gsm-tty);

 gsm_command(dlci-gsm, dlci-addr, DISC|PF);

+pm_runtime_put_tty(gsm-tty);

 mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100);

 }

 


@@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, 
const unsigned char *cp,

 flags = *f++;

 switch (flags) {

 case TTY_NORMAL:

+pm_runtime_get_sync_tty(gsm-tty);

 gsm-receive(gsm, *dp);

+pm_runtime_put_tty(gsm-tty);

 break;

 case TTY_OVERRUN:

 case TTY_BREAK:

@@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)

 {

 struct gsm_dlci *dlci = tty-driver_data;

 struct tty_port *port = dlci-port;

+int ret;

 


 port-count++;

 tty_port_tty_set(port, tty);

@@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct 
file *filp)

 /* Start sending off SABM messages */

 gsm_dlci_begin_open(dlci);

 /* And wait for virtual carrier */

-return tty_port_block_til_ready(port, tty, filp);

+pm_runtime_get_sync_tty(dlci-gsm-tty);

+ret = tty_port_block_til_ready(port, tty, filp);

+pm_runtime_put_tty(dlci-gsm-tty);

+

+return ret;

 }

 


 static void gsmtty_close(struct tty_struct *tty, struct file *filp)

@@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct 
file *filp)

 gsm = dlci-gsm;

 if (tty_port_close_start(dlci-port, tty, filp) == 0)

 return;

+pm_runtime_get_sync_tty(gsm-tty);

 gsm_dlci_begin_close(dlci);

 if (test_bit(ASYNCB_INITIALIZED, dlci-port.flags)) {

 if (C_HUPCL(tty))

 tty_port_lower_dtr_rts(dlci-port);

 }

+pm_runtime_put_tty(gsm-tty);

+

 tty_port_close_end(dlci-port, tty);

 tty_port_tty_set(dlci-port, NULL);

 return;

@@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const 
unsigned char *buf,

 struct gsm_dlci *dlci = tty-driver_data;

 if (dlci-state == DLCI_CLOSED)

 return -EINVAL;

+pm_runtime_get_sync_tty(dlci-gsm-tty);

 /* Stuff the bytes into the fifo queue */

 sent = kfifo_in_locked(dlci-fifo, buf, len, dlci-lock);

 /* Need to kick the channel */

 gsm_dlci_data_kick(dlci);

+pm_runtime_put_tty(dlci-gsm-tty);

 return sent;

 }

 


--

1.9.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in, acm_tty_write

2015-05-25 Thread Zhang, Yanmin

acm device might be used as ldisc device by n_gsm driver.

gsmtty_write and other gsm functions calls acm_tty_write

indirectly while they holds spinlocks.

Meanwhile, application might access ACM tty device directly.

Here we choose to call usb_autopm_get_interface_upgrade instead of

usb_autopm_get_interface_async to make sure above 2 scenarios can

work well.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com

---

---
 drivers/usb/class/cdc-acm.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 5c8f581..6ad85a3 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty,
 
 	dev_vdbg(acm-data-dev, %s - count %d\n, __func__, count);
 
+	stat = usb_autopm_get_interface_upgrade(acm-control);

+   if (stat)
+   return stat;
+
spin_lock_irqsave(acm-write_lock, flags);
wbn = acm_wb_alloc(acm);
if (wbn  0) {
spin_unlock_irqrestore(acm-write_lock, flags);
+   usb_autopm_put_interface(acm-control);
return 0;
}
wb = acm-wb[wbn];
@@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty,
if (!acm-dev) {
wb-use = 0;
spin_unlock_irqrestore(acm-write_lock, flags);
+   usb_autopm_put_interface(acm-control);
return -ENODEV;
}
 
@@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty,

memcpy(wb-buf, buf, count);
wb-len = count;
 
-	stat = usb_autopm_get_interface_async(acm-control);

-   if (stat) {
-   wb-use = 0;
-   spin_unlock_irqrestore(acm-write_lock, flags);
-   return stat;
-   }
-
if (acm-susp_count) {
usb_anchor_urb(wb-urb, acm-delayed);
spin_unlock_irqrestore(acm-write_lock, flags);
--
1.9.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] usb: add function usb_autopm_get_interface_upgrade

2015-05-25 Thread Zhang, Yanmin

Some usb driver has a specific requirement. Their critical functions

might be called under both atomic environment and non-atomic environment.

If it's under atomic environment, the driver can wake up the device

by calling pm_runtime_get_sync directly.

If it's under non-atomic environment, the function's caller need wake

up the device before the function accesses the device.

The patch adds usb_autopm_get_interface_upgrade, a new function to

support above capability.

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com

---

 drivers/usb/core/driver.c | 53 +++

 include/linux/usb.h   |  3 +++

 2 files changed, 56 insertions(+)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c

index 818369a..d07fd8d 100644

--- a/drivers/usb/core/driver.c

+++ b/drivers/usb/core/driver.c

@@ -1684,6 +1684,59 @@ int usb_autopm_get_interface(struct usb_interface *intf)

 EXPORT_SYMBOL_GPL(usb_autopm_get_interface);

 


 /**

+ * usb_autopm_get_interface_retry - increment a USB interface's PM-usage 
counter

+ * @intf: the usb_interface whose counter should be incremented

+ *

+ * This routine should be called by an interface driver when it wants to

+ * use @intf and needs to guarantee that it is not suspended.  In addition,

+ * the routine prevents @intf from being autosuspended subsequently.  (Note

+ * that this will not prevent suspend events originating in the PM core.)

+ * This prevention will persist until usb_autopm_put_interface() is called

+ * or @intf is unbound.  A typical example would be a character-device

+ * driver when its device file is opened.

+ *

+ * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade

+ * is more careful when resuming the device.

+ * 1) The caller's caller already resumes resume the device and hold spinlocks.

+ * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync;

+ * 2) The caller's caller doesn't resume the device.

+ * usb_autopm_get_interface_upgrade has to resume the device before going 
ahead.

+ *

+ * @intf's usage counter is incremented to prevent subsequent autosuspends.

+ * However if the autoresume fails then the counter is re-decremented.

+ *

+ * This routine can run only in process context.

+ *

+ * Return: 0 on success.

+ */

+int usb_autopm_get_interface_upgrade(struct usb_interface *intf)

+{

+intstatus = 0;

+

+pm_runtime_get(intf-dev);

+if (!pm_runtime_active(intf-dev)) {

+/* If not active, next _get_sync wakes device up*/

+status = pm_runtime_get_sync(intf-dev);

+/*

+ * If it's active, next _put_sync wouldn't

+ * really put it to sleep as the 1st _get

+ * keeps the device active.

+ */

+pm_runtime_put_sync(intf-dev);

+if (status  0)

+pm_runtime_put(intf-dev);

+} else

+atomic_inc(intf-pm_usage_cnt);

+dev_vdbg(intf-dev, %s: cnt %d - %d\n,

+__func__, atomic_read(intf-dev.power.usage_count),

+status);

+if (status  0)

+status = 0;

+return status;

+}

+EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade);

+

+/**

  * usb_autopm_get_interface_async - increment a USB interface's PM-usage 
counter

  * @intf: the usb_interface whose counter should be incremented

  *

diff --git a/include/linux/usb.h b/include/linux/usb.h

index 447fe29..0a8a44a 100644

--- a/include/linux/usb.h

+++ b/include/linux/usb.h

@@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev);

 extern void usb_disable_autosuspend(struct usb_device *udev);

 


 extern int usb_autopm_get_interface(struct usb_interface *intf);

+extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf);

 extern void usb_autopm_put_interface(struct usb_interface *intf);

 extern int usb_autopm_get_interface_async(struct usb_interface *intf);

 extern void usb_autopm_put_interface_async(struct usb_interface *intf);

@@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device 
*udev)

 


 static inline int usb_autopm_get_interface(struct usb_interface *intf)

 { return 0; }

+static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf)

+{ return 0; }

 static inline int usb_autopm_get_interface_async(struct usb_interface *intf)

 { return 0; }

 


--

1.9.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write

2015-05-25 Thread Zhang, Yanmin

There is a scenario about cdc-acm utilization.Application opens

n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device.

The application configures cdc-adm tty to n_gsm tty as ldisc tty.

n_gsm=cdc-acm=xhci driver

acm_tty_write can be called from n_gsm driver by ldisc connection,

and from application when application opens cdc-acm tty directly.

acm_tty_write wakes up the device by calling usb_autopm_get_interface_async,

which calls pm_runtime_get. However, pm_runtime_get can't wake up

the device before returning as it's an async wake up. Then, acm_tty_write

might access the device when it is off.

The patchset fixes it by:

1) add a new function usb_autopm_get_interface_upgrade to deal with

above 2 requirements;

2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] debugfs: keep the old valid mode value when no explicity specify it

2014-08-28 Thread Zhang, Yanmin

On 2014/8/28 23:10, Greg KH wrote:


On Thu, Aug 28, 2014 at 06:09:09PM +0800, Chen LinX wrote:

From: "Chen, LinX" 

When mount debugfs with no mode specifed after it's mounted, the mount
point mode will change to default mode(0700) even the mount operation was fail,
this will cause some issues like can't get binder info in android.

I don't understand, what did you do to get into this state?


Greg,

Thanks for your kind quick comments. The patch description can be improved.

We hit the issue when debugging a UIWDT issue. Android framework has a good
method to detect userspace hang and reports UIWDT issues. Android uses
client/server model. Clients communicates with servers by binder. binder has
debugfs interfaces. Some files show what threads are communicating with what
other threads. If one thread is blocked for a long time, we can find the
blocking chain from the binder info.

Since the error dumping process has no root access, booting process changes
debugfs mount dir mode to 0755. When UIWDT happens, the error dumping
process can read the info.

Unfortunately, some other scripts at booting try to mount debugfs for many 
times.
No matter if the double mounting fails or succeeds, debugfs_parse_options 
changes
the root inode's mode back to the default 0700. It means the effect of previous
mode changing to 0755 is lost. At UIWDT, the dumping process can't save binder
info to disk log files.



And why does binder care about debugfs?


Here we can keep the old valid mode if no explicity specify the mode
value and also change the mode value even the mount fails if the mode
value is specified.

I can't parse this sentence :(


Lin's patch checks if debugfs_mount_opts->mode is initiated.
If it is already, the patch would bypass the reinitiation, so the old mode
value is kept.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] debugfs: keep the old valid mode value when no explicity specify it

2014-08-28 Thread Zhang, Yanmin

On 2014/8/28 23:10, Greg KH wrote:


On Thu, Aug 28, 2014 at 06:09:09PM +0800, Chen LinX wrote:

From: Chen, LinX linx.z.c...@intel.com

When mount debugfs with no mode specifed after it's mounted, the mount
point mode will change to default mode(0700) even the mount operation was fail,
this will cause some issues like can't get binder info in android.

I don't understand, what did you do to get into this state?


Greg,

Thanks for your kind quick comments. The patch description can be improved.

We hit the issue when debugging a UIWDT issue. Android framework has a good
method to detect userspace hang and reports UIWDT issues. Android uses
client/server model. Clients communicates with servers by binder. binder has
debugfs interfaces. Some files show what threads are communicating with what
other threads. If one thread is blocked for a long time, we can find the
blocking chain from the binder info.

Since the error dumping process has no root access, booting process changes
debugfs mount dir mode to 0755. When UIWDT happens, the error dumping
process can read the info.

Unfortunately, some other scripts at booting try to mount debugfs for many 
times.
No matter if the double mounting fails or succeeds, debugfs_parse_options 
changes
the root inode's mode back to the default 0700. It means the effect of previous
mode changing to 0755 is lost. At UIWDT, the dumping process can't save binder
info to disk log files.



And why does binder care about debugfs?


Here we can keep the old valid mode if no explicity specify the mode
value and also change the mode value even the mount fails if the mode
value is specified.

I can't parse this sentence :(


Lin's patch checks if debugfs_mount_opts-mode is initiated.
If it is already, the patch would bypass the reinitiation, so the old mode
value is kept.

Yanmin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] export the function kmap_flush_unused.

2014-08-12 Thread Zhang, Yanmin

On 2014/8/11 19:54, Peter Zijlstra wrote:

On Mon, Aug 11, 2014 at 01:26:45AM +, Sha, Ruibin wrote:

Hi Chintan,
Thank you very much for your timely and kindly response and comments.

Here is more detail about our Scenario:

 We have a big driver on Android product. The driver allocates lots of
 DDR pages. When applications mmap a file exported from the driver,
 driver would mmap the pages to the application space, usually with
 uncachable prot.
 On ia32/x86_64 arch, we have to avoid page cache alias issue. When
 driver allocates the pages, it would change page original mapping in
 page table with uncachable prot. Sometimes, the allocated page was
 used by kmap/kunmap. After kunmap, the page is still mapped in KMAP
 space. The entries in KMAP page table are not cleaned up until a
 kernel thread flushes the freed KMAP pages(usually it is woken up by 
kunmap).
 It means the driver need  force to flush the KMAP page table entries 
before mapping pages to
 application space to be used. Otherwise, there is a race to create
 cache alias.

 To resolve this issue, we need export function kmap_flush_unused as
 the driver is compiled as module. Then, the driver calls
 kmap_flush_unused if the allocated pages are in HIGHMEM and being
 used by kmap.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?


Sorry, Peter. Ruibin is a new guy in LKML community. He uses outlook
to send emails. He would improve that.



That said, it sounds like you want set_memory_() to call
kmap_flush_unused(). Because this race it not at all specific to your
usage, it could happen to any set_memory_() site, right?

No. set_memory_() assumes the memory is not in HIGHMEM.
This scenario is driver allocates HIGHMEM pages, which are kmapped before.
Kernel uses a lazy method when kunmap a HIGHMEM page.
The pages are not unmapped from KMAP page table entries immediately.
When next kmap calling uses the same entry, kernel would change pte.
Or when change_page_attr_set_clr is called.

Our big driver doesn't call change_page_attr_set_clr when mmap the
pages with UNCACHABLE prot. It need call kmap_flush_unused directly after
allocating HIGHMEM pages.

Thanks for the kind comments.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] export the function kmap_flush_unused.

2014-08-12 Thread Zhang, Yanmin

On 2014/8/11 19:54, Peter Zijlstra wrote:

On Mon, Aug 11, 2014 at 01:26:45AM +, Sha, Ruibin wrote:

Hi Chintan,
Thank you very much for your timely and kindly response and comments.

Here is more detail about our Scenario:

 We have a big driver on Android product. The driver allocates lots of
 DDR pages. When applications mmap a file exported from the driver,
 driver would mmap the pages to the application space, usually with
 uncachable prot.
 On ia32/x86_64 arch, we have to avoid page cache alias issue. When
 driver allocates the pages, it would change page original mapping in
 page table with uncachable prot. Sometimes, the allocated page was
 used by kmap/kunmap. After kunmap, the page is still mapped in KMAP
 space. The entries in KMAP page table are not cleaned up until a
 kernel thread flushes the freed KMAP pages(usually it is woken up by 
kunmap).
 It means the driver need  force to flush the KMAP page table entries 
before mapping pages to
 application space to be used. Otherwise, there is a race to create
 cache alias.

 To resolve this issue, we need export function kmap_flush_unused as
 the driver is compiled as module. Then, the driver calls
 kmap_flush_unused if the allocated pages are in HIGHMEM and being
 used by kmap.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?


Sorry, Peter. Ruibin is a new guy in LKML community. He uses outlook
to send emails. He would improve that.



That said, it sounds like you want set_memory_() to call
kmap_flush_unused(). Because this race it not at all specific to your
usage, it could happen to any set_memory_() site, right?

No. set_memory_() assumes the memory is not in HIGHMEM.
This scenario is driver allocates HIGHMEM pages, which are kmapped before.
Kernel uses a lazy method when kunmap a HIGHMEM page.
The pages are not unmapped from KMAP page table entries immediately.
When next kmap calling uses the same entry, kernel would change pte.
Or when change_page_attr_set_clr is called.

Our big driver doesn't call change_page_attr_set_clr when mmap the
pages with UNCACHABLE prot. It need call kmap_flush_unused directly after
allocating HIGHMEM pages.

Thanks for the kind comments.

Yanmin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] export the function kmap_flush_unused.

2014-08-10 Thread Zhang, Yanmin


On 2014/8/11 9:26, Sha, Ruibin wrote:

Hi Chintan,
Thank you very much for your timely and kindly response and comments.

Here is more detail about our Scenario:

 We have a big driver on Android product. The driver allocates lots of
 DDR pages. When applications mmap a file exported from the driver,
 driver would mmap the pages to the application space, usually with
 uncachable prot.
 On ia32/x86_64 arch, we have to avoid page cache alias issue. When
 driver allocates the pages, it would change page original mapping in
 page table with uncachable prot. Sometimes, the allocated page was
 used by kmap/kunmap. After kunmap, the page is still mapped in KMAP
 space. The entries in KMAP page table are not cleaned up until a
 kernel thread flushes the freed KMAP pages(usually it is woken up by 
kunmap).
 It means the driver need  force to flush the KMAP page table entries 
before mapping pages to
 application space to be used. Otherwise, there is a race to create
 cache alias.

 To resolve this issue, we need export function kmap_flush_unused as
 the driver is compiled as module. Then, the driver calls
 kmap_flush_unused if the allocated pages are in HIGHMEM and being
 used by kmap.

Thanks again!

Best Regards
---
Sha, Rui bin ( Robin )
+86 13817890945
Android System Integration Shanghai

-Original Message-
From: Chintan Pandya [mailto:cpan...@codeaurora.org]
Sent: Friday, August 8, 2014 9:40 PM
To: Sha, Ruibin
Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; m...@csn.ul.ie; 
a.p.zijls...@chello.nl; mgor...@suse.de; mi...@redhat.com; Zhang, Yanmin; He, Bo
Subject: Re: [PATCH] export the function kmap_flush_unused.

On 08/08/2014 02:46 PM, Sha, Ruibin wrote:

export the function kmap_flush_unused.

Scenario: When graphic driver need high memory spece, we use
alloc_pages() to allocate. But if the allocated page has just been
mapped in the KMAP space(like first kmap then kunmap) and no flush
page happened on PKMAP, the page virtual address is not NULL.Then when
we get that page and set page attribute like set_memory_uc and
set_memory_wc, we hit error.

Could you explain your scenario with more details ? set_memory_* should be 
applied on mapped address. And in attempt to map your page (which was just kmap 
and kunmap'ed), it will overwrite the previous mappings.

Moreover, in my view, kmap_flush_unused is just helping us in keeping the cache 
clean for kmap virtual addresses if they are unmapped. Is it serving any more 
purpose here ?


It depends on how to define 'clean' here. It resets pkmap_count[i] to 0,
and cleans up page table entries used by PKMAP. Here, our scenario is
caused by the late page table entry cleanup as driver need avoid page
cache alias.




fix: For that scenario,when we get the allocated page and its virtual
address is not NULL, we would like first flush that page.
So need export that function kmap_flush_unused.

Signed-off-by: sha, ruibin 

---
mm/highmem.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/highmem.c b/mm/highmem.c
index b32b70c..511299b 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -156,6 +156,7 @@ void kmap_flush_unused(void)
flush_all_zero_pkmaps();
unlock_kmap();
}
+EXPORT_SYMBOL(kmap_flush_unused);

This symbol is already extern'ed. Is it not sufficient for your case ?


We want to call it in driver module. extern is not enough.

Thanks,
Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] export the function kmap_flush_unused.

2014-08-10 Thread Zhang, Yanmin


On 2014/8/11 9:26, Sha, Ruibin wrote:

Hi Chintan,
Thank you very much for your timely and kindly response and comments.

Here is more detail about our Scenario:

 We have a big driver on Android product. The driver allocates lots of
 DDR pages. When applications mmap a file exported from the driver,
 driver would mmap the pages to the application space, usually with
 uncachable prot.
 On ia32/x86_64 arch, we have to avoid page cache alias issue. When
 driver allocates the pages, it would change page original mapping in
 page table with uncachable prot. Sometimes, the allocated page was
 used by kmap/kunmap. After kunmap, the page is still mapped in KMAP
 space. The entries in KMAP page table are not cleaned up until a
 kernel thread flushes the freed KMAP pages(usually it is woken up by 
kunmap).
 It means the driver need  force to flush the KMAP page table entries 
before mapping pages to
 application space to be used. Otherwise, there is a race to create
 cache alias.

 To resolve this issue, we need export function kmap_flush_unused as
 the driver is compiled as module. Then, the driver calls
 kmap_flush_unused if the allocated pages are in HIGHMEM and being
 used by kmap.

Thanks again!

Best Regards
---
Sha, Rui bin ( Robin )
+86 13817890945
Android System Integration Shanghai

-Original Message-
From: Chintan Pandya [mailto:cpan...@codeaurora.org]
Sent: Friday, August 8, 2014 9:40 PM
To: Sha, Ruibin
Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; m...@csn.ul.ie; 
a.p.zijls...@chello.nl; mgor...@suse.de; mi...@redhat.com; Zhang, Yanmin; He, Bo
Subject: Re: [PATCH] export the function kmap_flush_unused.

On 08/08/2014 02:46 PM, Sha, Ruibin wrote:

export the function kmap_flush_unused.

Scenario: When graphic driver need high memory spece, we use
alloc_pages() to allocate. But if the allocated page has just been
mapped in the KMAP space(like first kmap then kunmap) and no flush
page happened on PKMAP, the page virtual address is not NULL.Then when
we get that page and set page attribute like set_memory_uc and
set_memory_wc, we hit error.

Could you explain your scenario with more details ? set_memory_* should be 
applied on mapped address. And in attempt to map your page (which was just kmap 
and kunmap'ed), it will overwrite the previous mappings.

Moreover, in my view, kmap_flush_unused is just helping us in keeping the cache 
clean for kmap virtual addresses if they are unmapped. Is it serving any more 
purpose here ?


It depends on how to define 'clean' here. It resets pkmap_count[i] to 0,
and cleans up page table entries used by PKMAP. Here, our scenario is
caused by the late page table entry cleanup as driver need avoid page
cache alias.




fix: For that scenario,when we get the allocated page and its virtual
address is not NULL, we would like first flush that page.
So need export that function kmap_flush_unused.

Signed-off-by: sha, ruibin ruibin@intel.com

---
mm/highmem.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/highmem.c b/mm/highmem.c
index b32b70c..511299b 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -156,6 +156,7 @@ void kmap_flush_unused(void)
flush_all_zero_pkmaps();
unlock_kmap();
}
+EXPORT_SYMBOL(kmap_flush_unused);

This symbol is already extern'ed. Is it not sufficient for your case ?


We want to call it in driver module. extern is not enough.

Thanks,
Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-23 Thread Zhang, Yanmin

On 2014/7/24 1:14, Mikulas Patocka wrote:


On Wed, 23 Jul 2014, Alasdair G Kergon wrote:


On Wed, Jul 23, 2014 at 08:16:58AM -0400, Mikulas Patocka wrote:

So, it means that you do not use device mapper at all. So, why are you
trying to change memory allocation in device mapper?
  
So the *test* they run is asking device-mapper to briefly reserve a 64KB

buffer when there is no data to report:  The answer is not to run that
pointless test:)

And if a single 64KB allocation really is a big deal, then patch 'vold'
in userspace so it doesn't ask for 64KB when it clearly doesn't need it!

+ int Devmapper::dumpState(SocketClient *c) {
+char *buffer = (char *) malloc(1024 * 64);

The code has just
#define DEVMAPPER_BUFFER_SIZE 4096
for all the other dm ioctls it issues.

Only use a larger value when it is needed i.e. if DM_BUFFER_FULL_FLAG gets set.

Alasdair

Device mapper shouldn't depend on allocation on any contiguous memory - it
will fall back to vmalloc. I still can't believe that their suggested
patch makes any difference.

This pattern is being repeated over and over in the kernel - for example:

 if (PIDLIST_TOO_LARGE(count))
 return vmalloc(count * sizeof(pid_t));
 else
 return kmalloc(count * sizeof(pid_t), GFP_KERNEL);


 if (is_vmalloc_addr(p))
 vfree(p);
 else
 kfree(p);

- I think we should make two functions that do this operation (for example
kvalloc and kvfree) and convert device mapper and other users to these
functions. Then, other kernel subsystems can start to use them to fix
memory fragmentation issues.


Thank Mikulas and Alasdair. Before sending out the patch, we know the result. :)
It's hard to balance between performance and stability.

Anyway, we would try to change seq_read.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-23 Thread Zhang, Yanmin

On 2014/7/24 1:14, Mikulas Patocka wrote:


On Wed, 23 Jul 2014, Alasdair G Kergon wrote:


On Wed, Jul 23, 2014 at 08:16:58AM -0400, Mikulas Patocka wrote:

So, it means that you do not use device mapper at all. So, why are you
trying to change memory allocation in device mapper?
  
So the *test* they run is asking device-mapper to briefly reserve a 64KB

buffer when there is no data to report:  The answer is not to run that
pointless test:)

And if a single 64KB allocation really is a big deal, then patch 'vold'
in userspace so it doesn't ask for 64KB when it clearly doesn't need it!

+ int Devmapper::dumpState(SocketClient *c) {
+char *buffer = (char *) malloc(1024 * 64);

The code has just
#define DEVMAPPER_BUFFER_SIZE 4096
for all the other dm ioctls it issues.

Only use a larger value when it is needed i.e. if DM_BUFFER_FULL_FLAG gets set.

Alasdair

Device mapper shouldn't depend on allocation on any contiguous memory - it
will fall back to vmalloc. I still can't believe that their suggested
patch makes any difference.

This pattern is being repeated over and over in the kernel - for example:

 if (PIDLIST_TOO_LARGE(count))
 return vmalloc(count * sizeof(pid_t));
 else
 return kmalloc(count * sizeof(pid_t), GFP_KERNEL);


 if (is_vmalloc_addr(p))
 vfree(p);
 else
 kfree(p);

- I think we should make two functions that do this operation (for example
kvalloc and kvfree) and convert device mapper and other users to these
functions. Then, other kernel subsystems can start to use them to fix
memory fragmentation issues.


Thank Mikulas and Alasdair. Before sending out the patch, we know the result. :)
It's hard to balance between performance and stability.

Anyway, we would try to change seq_read.

Yanmin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-22 Thread Zhang, Yanmin

On 2014/7/22 10:04, Alasdair G Kergon wrote:

On Tue, Jul 22, 2014 at 02:23:52AM +0100, Alasdair G Kergon wrote:

Unanswered.  Let's ask the same question in a different way:
  
A quick search for 'vold' returns:

   https://android.googlesource.com/platform/system/vold/
and the code there requests a fixed 64K allocation to hold the names of
active volumes.

So unlike libdevmapper-based applications where a smaller allocation is
used at first and only extended if needed, Android just assumes that
64KB is enough for everyone.

So is your claim that:

1. This 64KB allocation for the brief duration of the ioctl to store the
names of active device-mapper volumes leads to memory problems?
[Mustn't the system *already* be in a bad state if this pushes it over
the limit?]


It's a good question.
1) Usually, Android mobile runs for a long time. It's very command that
users don't turn off the phones for months. Users might start lots of
applications. memory is used up in the end. Kernel might recollect memory
over and over again.
2) We never blames this Out of memory issue fully to DM.
3) We want to improve the OOM issue.



and

2. The systems on which this memory shortage occurs have so many volumes
(with long names?) that a smaller allocation would not suffice?


64K is small, comparing with 2GB, even 4GB total memory. However, this 64K
by kmalloc has a strict criteria. It might be continuous physical memory
and align with 64K. If memory is used up, freed memory is very fragmented.
Such 64K criteria is a hard request.
Usually, driver can allocate such memory at booting initialization. After
that, it should allocate many sparse pages instead of continuous memory.
Here 64K allocation is after booting.

Thanks for the kind comments.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-22 Thread Zhang, Yanmin

On 2014/7/22 9:23, Alasdair G Kergon wrote:

On 2014/7/9 6:39, Mikulas Patocka wrote:

Which ioctl with more than 16kB
arguments do you use?

Unanswered.  Let's ask the same question in a different way:

Please supply the output of these three commands on the real-world system on
which you believe that that particular allocation is causing you a problem:
   dmsetup info -c
   dmsetup table
   dmsetup status


Android doesn't include the command. We compiled lvm2-2_02_107.tar.gz and copy
dmsetup to the board. But command reports No devices.

# /data/bin/dmsetup info
No devices found

# cat /proc/devices
Character devices:
  1 mem
  4 /dev/vc/0
  4 tty
  4 ttyS
  4 ttyMFD
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  7 vcs
 10 misc
 13 input
 21 sg
 29 fb
 81 video4linux
 89 i2c
108 ppp
116 alsa
128 ptm
136 pts
153 spi
166 ttyACM
180 usb
188 ttyUSB
189 usb_device
202 cpu/msr
203 cpu/cpuid
226 drm
241 mdm_ctrl
242 sep54
243 roccat
244 hidraw
245 ttyGS
246 usbmon
247 ttyPTI
248 gsmtty
249 bsg
250 iio
251 ptp
252 pps
253 media
254 rtc

Block devices:
  1 ramdisk
259 blkext
  7 loop
  8 sd
  9 md
 11 sr
 65 sd
 66 sd
 67 sd
 68 sd
 69 sd
 70 sd
 71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
179 mmc
253 device-mapper
254 mdp

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-22 Thread Zhang, Yanmin

On 2014/7/22 9:23, Alasdair G Kergon wrote:

On 2014/7/9 6:39, Mikulas Patocka wrote:

Which ioctl with more than 16kB
arguments do you use?

Unanswered.  Let's ask the same question in a different way:

Please supply the output of these three commands on the real-world system on
which you believe that that particular allocation is causing you a problem:
   dmsetup info -c
   dmsetup table
   dmsetup status


Android doesn't include the command. We compiled lvm2-2_02_107.tar.gz and copy
dmsetup to the board. But command reports No devices.

# /data/bin/dmsetup info
No devices found

# cat /proc/devices
Character devices:
  1 mem
  4 /dev/vc/0
  4 tty
  4 ttyS
  4 ttyMFD
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  7 vcs
 10 misc
 13 input
 21 sg
 29 fb
 81 video4linux
 89 i2c
108 ppp
116 alsa
128 ptm
136 pts
153 spi
166 ttyACM
180 usb
188 ttyUSB
189 usb_device
202 cpu/msr
203 cpu/cpuid
226 drm
241 mdm_ctrl
242 sep54
243 roccat
244 hidraw
245 ttyGS
246 usbmon
247 ttyPTI
248 gsmtty
249 bsg
250 iio
251 ptp
252 pps
253 media
254 rtc

Block devices:
  1 ramdisk
259 blkext
  7 loop
  8 sd
  9 md
 11 sr
 65 sd
 66 sd
 67 sd
 68 sd
 69 sd
 70 sd
 71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
179 mmc
253 device-mapper
254 mdp

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-22 Thread Zhang, Yanmin

On 2014/7/22 10:04, Alasdair G Kergon wrote:

On Tue, Jul 22, 2014 at 02:23:52AM +0100, Alasdair G Kergon wrote:

Unanswered.  Let's ask the same question in a different way:
  
A quick search for 'vold' returns:

   https://android.googlesource.com/platform/system/vold/
and the code there requests a fixed 64K allocation to hold the names of
active volumes.

So unlike libdevmapper-based applications where a smaller allocation is
used at first and only extended if needed, Android just assumes that
64KB is enough for everyone.

So is your claim that:

1. This 64KB allocation for the brief duration of the ioctl to store the
names of active device-mapper volumes leads to memory problems?
[Mustn't the system *already* be in a bad state if this pushes it over
the limit?]


It's a good question.
1) Usually, Android mobile runs for a long time. It's very command that
users don't turn off the phones for months. Users might start lots of
applications. memory is used up in the end. Kernel might recollect memory
over and over again.
2) We never blames this Out of memory issue fully to DM.
3) We want to improve the OOM issue.



and

2. The systems on which this memory shortage occurs have so many volumes
(with long names?) that a smaller allocation would not suffice?


64K is small, comparing with 2GB, even 4GB total memory. However, this 64K
by kmalloc has a strict criteria. It might be continuous physical memory
and align with 64K. If memory is used up, freed memory is very fragmented.
Such 64K criteria is a hard request.
Usually, driver can allocate such memory at booting initialization. After
that, it should allocate many sparse pages instead of continuous memory.
Here 64K allocation is after booting.

Thanks for the kind comments.

Yanmin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-21 Thread Zhang, Yanmin

On 2014/7/9 22:53, Mikulas Patocka wrote:


On Wed, 9 Jul 2014, Zhang, Yanmin wrote:


On 2014/7/9 6:39, Mikulas Patocka wrote:


Hi

Mikulas,

Thanks for your kind comments.


I don't really know what is the purpose of this patch. In existing device
mapper code, if kmalloc fails, the allocation is retried with __vmalloc.
So there is no need to avoid kmalloc aritifically.

kmalloc doesn't cause memory fragmentation. If the memory is too
fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause
memory being fragmented.

I agree with you. The patch's original description is not appropriate.
Basically, memory fragmentation is not caused by this kmalloc.

The right description is: When memory is fragmented and most memory is used
up,
kmalloc a big memory might cause lots of OutOFMemory and system might kill
lots of processes. Then, system might hang.


Sorry for replying you too late. I am very busy in some other critical issues.


The question is - does this particular kmalloc in device mapper cause out
of memory or killing of other tasks? It has flags __GFP_NORETRY,


When memory is fragmented, drivers need allocate small pages instead of big
memory. Even with __GFP_NORETRY, driver might get a big memory by luck. That 
means
other drivers would get fewer chances to fulfill their memory requests, such 
like
allocating 2 pages for task_struct. Later on, OOM might happen.


__GFP_NOMEMALLOC, __GFP_NOWARN so it shouldn't cause any trouble. It
should just fail silently if memory is fragmented.


It's hard to say this call causes out of memory. There are many such places
in kernel to allocate big continuous memory. One is in seq_read, where we
created a patch to use vmalloc instead of kmalloc to fix it, but got far
worse comments as it's very old code. Another is in our own gfx driver.
We want to fix all. We can't blame the OOM to just one place.

Monkey testing is popular for Android development. We run the testing 
frequently.
It might start lots of applications. Eventually, it is a comprehensive testing.



Do you have some stacktrace that identifies this kmalloc as a problem?


Sometimes, when OOM happens, kernel log shows some backtrace of big continuous
memory allocation failure. Sometimes, when board can't respond and watchdog 
might
reset the board after saving thread callchain into disk.



Do this test - prepare two kernels that are identical, except that one
kernel has that one-line change in dm-ioctl. Boot each kernel 10 times, do
exactly the same operation after boot. Does the kernel with the patch
always behave correctly and does the kernel without the patch always fail?


No. Instead of just one, many places have impact on the OOM issue.


Report the result - how many failures did you get with or without that
one-line patch. Without such a test - I just don't believe that your patch
makes any difference.

Another question - your patch only makes change if some device mapper
ioctl has more than 16kB arugments. Which ioctl with more than 16kB
arguments do you use? Do you load such a big table to device mapper? How
often do you call that ioctl with such big arguments?


Xinhui's email mentions the ioctl details. In android, there is a command
"dumpstate", it run many other commands to collect information. In our
scenario, it run command "vdc dump", and vdc uses socket to pass some
parameters to "vold", then vold generates ioctl.

Thanks for your patience.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-21 Thread Zhang, Yanmin

On 2014/7/9 22:53, Mikulas Patocka wrote:


On Wed, 9 Jul 2014, Zhang, Yanmin wrote:


On 2014/7/9 6:39, Mikulas Patocka wrote:


Hi

Mikulas,

Thanks for your kind comments.


I don't really know what is the purpose of this patch. In existing device
mapper code, if kmalloc fails, the allocation is retried with __vmalloc.
So there is no need to avoid kmalloc aritifically.

kmalloc doesn't cause memory fragmentation. If the memory is too
fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause
memory being fragmented.

I agree with you. The patch's original description is not appropriate.
Basically, memory fragmentation is not caused by this kmalloc.

The right description is: When memory is fragmented and most memory is used
up,
kmalloc a big memory might cause lots of OutOFMemory and system might kill
lots of processes. Then, system might hang.


Sorry for replying you too late. I am very busy in some other critical issues.


The question is - does this particular kmalloc in device mapper cause out
of memory or killing of other tasks? It has flags __GFP_NORETRY,


When memory is fragmented, drivers need allocate small pages instead of big
memory. Even with __GFP_NORETRY, driver might get a big memory by luck. That 
means
other drivers would get fewer chances to fulfill their memory requests, such 
like
allocating 2 pages for task_struct. Later on, OOM might happen.


__GFP_NOMEMALLOC, __GFP_NOWARN so it shouldn't cause any trouble. It
should just fail silently if memory is fragmented.


It's hard to say this call causes out of memory. There are many such places
in kernel to allocate big continuous memory. One is in seq_read, where we
created a patch to use vmalloc instead of kmalloc to fix it, but got far
worse comments as it's very old code. Another is in our own gfx driver.
We want to fix all. We can't blame the OOM to just one place.

Monkey testing is popular for Android development. We run the testing 
frequently.
It might start lots of applications. Eventually, it is a comprehensive testing.



Do you have some stacktrace that identifies this kmalloc as a problem?


Sometimes, when OOM happens, kernel log shows some backtrace of big continuous
memory allocation failure. Sometimes, when board can't respond and watchdog 
might
reset the board after saving thread callchain into disk.



Do this test - prepare two kernels that are identical, except that one
kernel has that one-line change in dm-ioctl. Boot each kernel 10 times, do
exactly the same operation after boot. Does the kernel with the patch
always behave correctly and does the kernel without the patch always fail?


No. Instead of just one, many places have impact on the OOM issue.


Report the result - how many failures did you get with or without that
one-line patch. Without such a test - I just don't believe that your patch
makes any difference.

Another question - your patch only makes change if some device mapper
ioctl has more than 16kB arugments. Which ioctl with more than 16kB
arguments do you use? Do you load such a big table to device mapper? How
often do you call that ioctl with such big arguments?


Xinhui's email mentions the ioctl details. In android, there is a command
dumpstate, it run many other commands to collect information. In our
scenario, it run command vdc dump, and vdc uses socket to pass some
parameters to vold, then vold generates ioctl.

Thanks for your patience.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf: Don't enable the perf_event without in PERF_ATTACH_CONTEXT status

2014-07-15 Thread Zhang, Yanmin


On 2014/7/14 21:27, Peter Zijlstra wrote:

On Thu, Jul 03, 2014 at 11:36:38AM +0800, Chen LinX wrote:

From: "Chen LinX" 

when do cpu hotplug test and run below perf test together, pmu may access freed 
perf_event

while true;
do
perf record -a -g -f sleep 10
rm perf.*
done

the scenario is that when cpu offline firstly, the 'perf_cpu_notify' will 
disable event on the
pmu and remove it from the context list. after cpu online, the perf app may 
enable the event

But it does not, right?


Thanks for your kind comments.
It does, actually.
The major reason is application calls many syscall to start perf.
1) perf_event_open => perf_install_in_context;
2) perf_ioctl => perf_event_enable.

After step 1), the cpu might be hot unplugged, and the event is removed from
the cpu context.
Then, the cpu is hog plugged back. The app runs at step 2) to enable the
event on that cpu.
Then, the cpu is hot unplugged again. As the event is not linked to
the cpu's context, perf_cpu_notify can't disable the events. Mostly, the
event is linked in cpuc->event_list[XXX]. At that time, applcation
might free the event.

When the cpu is plugged back, it might use the freed event and cause kernel
panic.




that without linked in context list again. when cpu offine the second time, the 
'perf_cpu_notify'
can't disable event on the pmu as the event doesn't link to context list. the 
perf app may free
this event later(the free procedure try to disable event on the pmu but as the 
cpu is offline,
the 'cpu_function_call(event->cpu, __perf_remove_from_context, event)' is 
failed)

Failed how, below is __perf_install_in_context.


. then after
cpu online again, pmu will access freed perf_event and hit panic.

so adding PERF_ATTACH_CONTEXT flag check before enable event to avoid this 
scenario.

In fact it does not. If you look at perf_event_enable() there's a code
path that doesn't call __perf_event_enable().


Here we hit it with a per-cpu event instead of task event.




[  157.670138 ]  [] __perf_install_in_context+0xff/0x170

And yet, __perf_install_in_context isn't mentioned at all in the above.


Change-Id: I7265d83159b9180e9be3a370ba50e067385547bd
Signed-off-by: Yanmin Zhang 
Signed-off-by: Chen LinX 

Wrong SoB-chain, Yanmin didn't author this patch did he, seeing how From
is you. And Yanmin didn't actually send me this patch either.


Lin works with me in the same team. He is smart, but new in kernel upstream

community. I debugged with him and he caught the root cause ahead of me.

The patch is good. Lin is running more testing with the latest kernel
3.16.0-rc5 now.

Thanks,

Yanmin



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf: Don't enable the perf_event without in PERF_ATTACH_CONTEXT status

2014-07-15 Thread Zhang, Yanmin


On 2014/7/14 21:27, Peter Zijlstra wrote:

On Thu, Jul 03, 2014 at 11:36:38AM +0800, Chen LinX wrote:

From: Chen LinX linx.z.c...@intel.com

when do cpu hotplug test and run below perf test together, pmu may access freed 
perf_event

while true;
do
perf record -a -g -f sleep 10
rm perf.*
done

the scenario is that when cpu offline firstly, the 'perf_cpu_notify' will 
disable event on the
pmu and remove it from the context list. after cpu online, the perf app may 
enable the event

But it does not, right?


Thanks for your kind comments.
It does, actually.
The major reason is application calls many syscall to start perf.
1) perf_event_open = perf_install_in_context;
2) perf_ioctl = perf_event_enable.

After step 1), the cpu might be hot unplugged, and the event is removed from
the cpu context.
Then, the cpu is hog plugged back. The app runs at step 2) to enable the
event on that cpu.
Then, the cpu is hot unplugged again. As the event is not linked to
the cpu's context, perf_cpu_notify can't disable the events. Mostly, the
event is linked in cpuc-event_list[XXX]. At that time, applcation
might free the event.

When the cpu is plugged back, it might use the freed event and cause kernel
panic.




that without linked in context list again. when cpu offine the second time, the 
'perf_cpu_notify'
can't disable event on the pmu as the event doesn't link to context list. the 
perf app may free
this event later(the free procedure try to disable event on the pmu but as the 
cpu is offline,
the 'cpu_function_call(event-cpu, __perf_remove_from_context, event)' is 
failed)

Failed how, below is __perf_install_in_context.


. then after
cpu online again, pmu will access freed perf_event and hit panic.

so adding PERF_ATTACH_CONTEXT flag check before enable event to avoid this 
scenario.

In fact it does not. If you look at perf_event_enable() there's a code
path that doesn't call __perf_event_enable().


Here we hit it with a per-cpu event instead of task event.




[  157.670138 ]  [8216321f] __perf_install_in_context+0xff/0x170

And yet, __perf_install_in_context isn't mentioned at all in the above.


Change-Id: I7265d83159b9180e9be3a370ba50e067385547bd
Signed-off-by: Yanmin Zhang yanmin.zh...@intel.com
Signed-off-by: Chen LinX linx.z.c...@intel.com

Wrong SoB-chain, Yanmin didn't author this patch did he, seeing how From
is you. And Yanmin didn't actually send me this patch either.


Lin works with me in the same team. He is smart, but new in kernel upstream

community. I debugged with him and he caught the root cause ahead of me.

The patch is good. Lin is running more testing with the latest kernel
3.16.0-rc5 now.

Thanks,

Yanmin



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-08 Thread Zhang, Yanmin

On 2014/7/9 6:39, Mikulas Patocka wrote:


Hi


Mikulas,

Thanks for your kind comments.


I don't really know what is the purpose of this patch. In existing device
mapper code, if kmalloc fails, the allocation is retried with __vmalloc.
So there is no need to avoid kmalloc aritifically.

kmalloc doesn't cause memory fragmentation. If the memory is too
fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause
memory being fragmented.


I agree with you. The patch's original description is not appropriate.
Basically, memory fragmentation is not caused by this kmalloc.

The right description is: When memory is fragmented and most memory is used up,
kmalloc a big memory might cause lots of OutOFMemory and system might kill
lots of processes. Then, system might hang.

We are enabling Android on mobiles and tablets. We hit OOM often at
Monkey and other stress testing. Sometimes, kernel prints big memory allocation
warning.

Theoretically, it's a good idea to call kmalloc firstly. If it fails, use 
vmalloc.
However, kernel assumes drivers do the right thing. When drivers call kmalloc to
allocate a big memory, memory management would do the best to fulfill it. When 
memory
is fragmented and most memory is used up, such allocation would cause big memory
recollection. Some processes might be killed. The worse scenario is after a 
process
is killed, it is restarted and allocates memory again. That causes system hang,
or mobile users feel a big stall. We did fix a similar issue in one of our 
drivers.

Usually, kernel and drivers can allocate big continuous memory at booting or
initialization stage. After that, they need allocate small order memory. The 
best
is to just allocate order-0 memory.




If you have some other driver that fails because of large kmalloc failure,
you should fix it to use scatter-gather DMA or (if the hardware can't do
it) preallocate the memory when the driver is loaded.


I agree with you. Here the patch fixes the issue, where dm might allocate
big continuous memory after booting. Monkey testing triggers it by operating
menu Setting=>Security=>InstallfromSDcard.

We are catching all places in kernel where big memory allocation happens.
This patch is just to fix one case.

Thanks,
Yanmin



Mikulas



On Fri, 4 Jul 2014, xinhui.pan wrote:


KMALLOC_MAX_SIZE is too big, it easily cause memory fragmented and low memory when 
param_kernel->data_size
is also big. That's not nice. So use vmalloc instead of kmalloc when size is larger 
than (PAGE_SIZE << 2).
There are other drivers using kmalloc to gain a big size memory. And that cause 
our devices to hit hang and
many worse issues. We don't benefit much when using kmalloc in such scenario.

Signed-off-by: xinhui.pan 
Signed-off-by: yanmin.zhang 
---
  drivers/md/dm-ioctl.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 5152142..31c3af9 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1709,7 +1709,7 @@ static int copy_params(struct dm_ioctl __user *user, 
struct dm_ioctl *param_kern
  * Use kmalloc() rather than vmalloc() when we can.
  */
 dmi = NULL;
-   if (param_kernel->data_size <= KMALLOC_MAX_SIZE) {
+   if (param_kernel->data_size <= (PAGE_SIZE << 2)) {
 dmi = kmalloc(param_kernel->data_size, GFP_NOIO | 
__GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 if (dmi)
 *param_flags |= DM_PARAMS_KMALLOC;
--
1.7.9.5

--
dm-devel mailing list
dm-de...@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-08 Thread Zhang, Yanmin

On 2014/7/9 6:39, Mikulas Patocka wrote:


Hi


Mikulas,

Thanks for your kind comments.


I don't really know what is the purpose of this patch. In existing device
mapper code, if kmalloc fails, the allocation is retried with __vmalloc.
So there is no need to avoid kmalloc aritifically.

kmalloc doesn't cause memory fragmentation. If the memory is too
fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause
memory being fragmented.


I agree with you. The patch's original description is not appropriate.
Basically, memory fragmentation is not caused by this kmalloc.

The right description is: When memory is fragmented and most memory is used up,
kmalloc a big memory might cause lots of OutOFMemory and system might kill
lots of processes. Then, system might hang.

We are enabling Android on mobiles and tablets. We hit OOM often at
Monkey and other stress testing. Sometimes, kernel prints big memory allocation
warning.

Theoretically, it's a good idea to call kmalloc firstly. If it fails, use 
vmalloc.
However, kernel assumes drivers do the right thing. When drivers call kmalloc to
allocate a big memory, memory management would do the best to fulfill it. When 
memory
is fragmented and most memory is used up, such allocation would cause big memory
recollection. Some processes might be killed. The worse scenario is after a 
process
is killed, it is restarted and allocates memory again. That causes system hang,
or mobile users feel a big stall. We did fix a similar issue in one of our 
drivers.

Usually, kernel and drivers can allocate big continuous memory at booting or
initialization stage. After that, they need allocate small order memory. The 
best
is to just allocate order-0 memory.




If you have some other driver that fails because of large kmalloc failure,
you should fix it to use scatter-gather DMA or (if the hardware can't do
it) preallocate the memory when the driver is loaded.


I agree with you. Here the patch fixes the issue, where dm might allocate
big continuous memory after booting. Monkey testing triggers it by operating
menu Setting=Security=InstallfromSDcard.

We are catching all places in kernel where big memory allocation happens.
This patch is just to fix one case.

Thanks,
Yanmin



Mikulas



On Fri, 4 Jul 2014, xinhui.pan wrote:


KMALLOC_MAX_SIZE is too big, it easily cause memory fragmented and low memory when 
param_kernel-data_size
is also big. That's not nice. So use vmalloc instead of kmalloc when size is larger 
than (PAGE_SIZE  2).
There are other drivers using kmalloc to gain a big size memory. And that cause 
our devices to hit hang and
many worse issues. We don't benefit much when using kmalloc in such scenario.

Signed-off-by: xinhui.pan xinhuix@intel.com
Signed-off-by: yanmin.zhang yanmin_zh...@linux.intel.com
---
  drivers/md/dm-ioctl.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 5152142..31c3af9 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1709,7 +1709,7 @@ static int copy_params(struct dm_ioctl __user *user, 
struct dm_ioctl *param_kern
  * Use kmalloc() rather than vmalloc() when we can.
  */
 dmi = NULL;
-   if (param_kernel-data_size = KMALLOC_MAX_SIZE) {
+   if (param_kernel-data_size = (PAGE_SIZE  2)) {
 dmi = kmalloc(param_kernel-data_size, GFP_NOIO | 
__GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 if (dmi)
 *param_flags |= DM_PARAMS_KMALLOC;
--
1.7.9.5

--
dm-devel mailing list
dm-de...@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Should Pstore(ramoops) records customized information?

2014-06-30 Thread Zhang, Yanmin

On 2014/6/27 20:06, Liu hua wrote:

于 2014/6/26 8:57, Zhang, Yanmin 写道:

On 2014/6/25 21:08, Liu hua wrote:

于 2014/6/25 8:41, Zhang, Yanmin 写道:

On 2014/6/20 18:47, Liu hua wrote:

On 2014/6/20 7:42, Luck, Tony wrote:


BTW, I note that "extern struct pstore_info *psinfo" locates in
fs/pstore/internal.h. So users out of directory "fs/pstore/" can not use pstore 
to
record messages. We do not want other kernel users to use pstore, right?  And 
can we
break this?

Yes we can make some interface visible to the rest of the kernel ... probably
not the raw "*psinfo" though. Perhaps the pstore_alloc_ring_buffer() and
pstore_write_ring_buffer() functions should be the ones exported to the
rest of the kernel.


ditoo.. Since other backends like efi and erst may can not privide such ring 
buffer.
So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct.

Yes - that allows less capable backend like ERST and efivars to not provide the
service.  Since it becomes internal, you can drop the "pstore_" prefix.  E.g.
something like:

int pstore_alloc_ring_buffer(char *name, int size)
{
  return psinfo->alloc_ring_buffer(name, size);
}
EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer);

... and you have to find/make some global header for the "extern" declaration.

I will make these RFC patch series according to our discussion. Thanks you very 
to
valuable advice.

Sorry for seeing your email late.We already worked out some patches to 
restructure
pstore. Would you like to try patchset 
http://article.gmane.org/gmane.linux.kernel/1697680/?

We have more patches available to add some flags to disable/enable specific 
zones.

That's great! I have tried your patches. BTW, Your patches do not work on ARM 
platform,
before I changed linker scripts;

Initially, we just implemented it on x86. It's easy to extend it to ARM. Mostly 
change the arm
vmlinux.lds.S to add the sections. Pls. also change setup_arch to allocate 
memory blocks for
pstore.
In the patchset, there is an example patch, including reserve memory and zone 
examples.
Pls. reference it.


   And can we use this method in modules(I failed to do that)?

It's a good question. There are many approaches to support modules.
1) Define the zone in built-in files and export it.Then, you can use it in 
module.
2) Define the zone and new tracer functions in built-in files and export
the tracer functions.


After a quick glance and try, I think my idea is a little different from yours. 
I will reply you
later.

Pls. Share your opinions. We are improving pstore to make it easier to be used.


This feature can use in real products (actually we have done that), because 
usually several
mega-byte-size ram is enough and it is very useful for fault location. So I 
want that pstore
can be implemented in products, not just in labs.

These seems that there are at least two ways to make pstore visible to other 
kernel users:


(1) static allocation:(your way, maybe my description is not good, please 
correct me)

When kernel image is made, zones are determinate;
So if moudules want to use a zone, we should define it in kernel source 
before compiling
and export it;

   (a)advantage:

This method will not fail at most time if it passed at first time.

   (b)disadvantage:

Engineer should change the kernel source code if he want to get a zone 
to record something. For lab,
 it is good enough; but for products, different products may use 
different kernel source codes if they
 want record different messages. It is very expensive.

(2) dynamic alloction: (ring buffer similar to your zone)

(1) We should introduce metadata to describe the ring buffers in the 
ramoops bankend.
So when initializing, we just need to read metadata. then we know 
information of all ring buffers.
So we can read and manage all ring buffers in a list named 
"ring_buffer_list";

(2) when we call pstore_alloc_ring_buffer(name,size). If 
"ring_buffer_list" contains this ring buffer,
we do nothing if size check passes. Else we create a ring buffer 
description and add it to list;

(3) When we call pstore_write_ring_buffer(name,str). we find this ring buffer in 
"ring_buffer_list" by name;
and then copy strings to this ring buffer.


disadvantage:

   (1) So this alloc maybe fail when name is the same, but size isn't;
   (2) We should find a way to format the backend. erase is not enough. 
(earse just clear the data of a ring buffer,
but the ring buffer still exists).

advantage:
   (1) Os venders can implement this feature. Then people can use it 
even they can not compile the kernel.
   (2) We can determin how to use the ramoops backend at runtime, 
rather than before compiling t

Re: Should Pstore(ramoops) records customized information?

2014-06-30 Thread Zhang, Yanmin

On 2014/6/27 20:06, Liu hua wrote:

于 2014/6/26 8:57, Zhang, Yanmin 写道:

On 2014/6/25 21:08, Liu hua wrote:

于 2014/6/25 8:41, Zhang, Yanmin 写道:

On 2014/6/20 18:47, Liu hua wrote:

On 2014/6/20 7:42, Luck, Tony wrote:


BTW, I note that extern struct pstore_info *psinfo locates in
fs/pstore/internal.h. So users out of directory fs/pstore/ can not use pstore 
to
record messages. We do not want other kernel users to use pstore, right?  And 
can we
break this?

Yes we can make some interface visible to the rest of the kernel ... probably
not the raw *psinfo though. Perhaps the pstore_alloc_ring_buffer() and
pstore_write_ring_buffer() functions should be the ones exported to the
rest of the kernel.


ditoo.. Since other backends like efi and erst may can not privide such ring 
buffer.
So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct.

Yes - that allows less capable backend like ERST and efivars to not provide the
service.  Since it becomes internal, you can drop the pstore_ prefix.  E.g.
something like:

int pstore_alloc_ring_buffer(char *name, int size)
{
  return psinfo-alloc_ring_buffer(name, size);
}
EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer);

... and you have to find/make some global header for the extern declaration.

I will make these RFC patch series according to our discussion. Thanks you very 
to
valuable advice.

Sorry for seeing your email late.We already worked out some patches to 
restructure
pstore. Would you like to try patchset 
http://article.gmane.org/gmane.linux.kernel/1697680/?

We have more patches available to add some flags to disable/enable specific 
zones.

That's great! I have tried your patches. BTW, Your patches do not work on ARM 
platform,
before I changed linker scripts;

Initially, we just implemented it on x86. It's easy to extend it to ARM. Mostly 
change the arm
vmlinux.lds.S to add the sections. Pls. also change setup_arch to allocate 
memory blocks for
pstore.
In the patchset, there is an example patch, including reserve memory and zone 
examples.
Pls. reference it.


   And can we use this method in modules(I failed to do that)?

It's a good question. There are many approaches to support modules.
1) Define the zone in built-in files and export it.Then, you can use it in 
module.
2) Define the zone and new tracer functions in built-in files and export
the tracer functions.


After a quick glance and try, I think my idea is a little different from yours. 
I will reply you
later.

Pls. Share your opinions. We are improving pstore to make it easier to be used.


This feature can use in real products (actually we have done that), because 
usually several
mega-byte-size ram is enough and it is very useful for fault location. So I 
want that pstore
can be implemented in products, not just in labs.

These seems that there are at least two ways to make pstore visible to other 
kernel users:


(1) static allocation:(your way, maybe my description is not good, please 
correct me)

When kernel image is made, zones are determinate;
So if moudules want to use a zone, we should define it in kernel source 
before compiling
and export it;

   (a)advantage:

This method will not fail at most time if it passed at first time.

   (b)disadvantage:

Engineer should change the kernel source code if he want to get a zone 
to record something. For lab,
 it is good enough; but for products, different products may use 
different kernel source codes if they
 want record different messages. It is very expensive.

(2) dynamic alloction: (ring buffer similar to your zone)

(1) We should introduce metadata to describe the ring buffers in the 
ramoops bankend.
So when initializing, we just need to read metadata. then we know 
information of all ring buffers.
So we can read and manage all ring buffers in a list named 
ring_buffer_list;

(2) when we call pstore_alloc_ring_buffer(name,size). If 
ring_buffer_list contains this ring buffer,
we do nothing if size check passes. Else we create a ring buffer 
description and add it to list;

(3) When we call pstore_write_ring_buffer(name,str). we find this ring buffer in 
ring_buffer_list by name;
and then copy strings to this ring buffer.


disadvantage:

   (1) So this alloc maybe fail when name is the same, but size isn't;
   (2) We should find a way to format the backend. erase is not enough. 
(earse just clear the data of a ring buffer,
but the ring buffer still exists).

advantage:
   (1) Os venders can implement this feature. Then people can use it 
even they can not compile the kernel.
   (2) We can determin how to use the ramoops backend at runtime, 
rather than before compiling the kernel.



So which way do we really need?
And what do you think, Tony or others ?


I got your

Re: Should Pstore(ramoops) records customized information?

2014-06-25 Thread Zhang, Yanmin


On 2014/6/25 21:08, Liu hua wrote:

于 2014/6/25 8:41, Zhang, Yanmin 写道:

On 2014/6/20 18:47, Liu hua wrote:

On 2014/6/20 7:42, Luck, Tony wrote:


BTW, I note that "extern struct pstore_info *psinfo" locates in
fs/pstore/internal.h. So users out of directory "fs/pstore/" can not use pstore 
to
record messages. We do not want other kernel users to use pstore, right?  And 
can we
break this?

Yes we can make some interface visible to the rest of the kernel ... probably
not the raw "*psinfo" though. Perhaps the pstore_alloc_ring_buffer() and
pstore_write_ring_buffer() functions should be the ones exported to the
rest of the kernel.


ditoo.. Since other backends like efi and erst may can not privide such ring 
buffer.
So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct.

Yes - that allows less capable backend like ERST and efivars to not provide the
service.  Since it becomes internal, you can drop the "pstore_" prefix.  E.g.
something like:

int pstore_alloc_ring_buffer(char *name, int size)
{
 return psinfo->alloc_ring_buffer(name, size);
}
EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer);

... and you have to find/make some global header for the "extern" declaration.

I will make these RFC patch series according to our discussion. Thanks you very 
to
valuable advice.

Sorry for seeing your email late.We already worked out some patches to 
restructure
pstore. Would you like to try patchset 
http://article.gmane.org/gmane.linux.kernel/1697680/?

We have more patches available to add some flags to disable/enable specific 
zones.

That's great! I have tried your patches. BTW, Your patches do not work on ARM 
platform,
before I changed linker scripts;


Initially, we just implemented it on x86. It's easy to extend it to ARM. Mostly 
change the arm
vmlinux.lds.S to add the sections. Pls. also change setup_arch to allocate 
memory blocks for
pstore.
In the patchset, there is an example patch, including reserve memory and zone 
examples.
Pls. reference it.


  And can we use this method in modules(I failed to do that)?


It's a good question. There are many approaches to support modules.
1) Define the zone in built-in files and export it.Then, you can use it in 
module.
2) Define the zone and new tracer functions in built-in files and export
the tracer functions.



After a quick glance and try, I think my idea is a little different from yours. 
I will reply you
later.


Pls. Share your opinions. We are improving pstore to make it easier to be used.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Should Pstore(ramoops) records customized information?

2014-06-25 Thread Zhang, Yanmin


On 2014/6/25 21:08, Liu hua wrote:

于 2014/6/25 8:41, Zhang, Yanmin 写道:

On 2014/6/20 18:47, Liu hua wrote:

On 2014/6/20 7:42, Luck, Tony wrote:


BTW, I note that extern struct pstore_info *psinfo locates in
fs/pstore/internal.h. So users out of directory fs/pstore/ can not use pstore 
to
record messages. We do not want other kernel users to use pstore, right?  And 
can we
break this?

Yes we can make some interface visible to the rest of the kernel ... probably
not the raw *psinfo though. Perhaps the pstore_alloc_ring_buffer() and
pstore_write_ring_buffer() functions should be the ones exported to the
rest of the kernel.


ditoo.. Since other backends like efi and erst may can not privide such ring 
buffer.
So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct.

Yes - that allows less capable backend like ERST and efivars to not provide the
service.  Since it becomes internal, you can drop the pstore_ prefix.  E.g.
something like:

int pstore_alloc_ring_buffer(char *name, int size)
{
 return psinfo-alloc_ring_buffer(name, size);
}
EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer);

... and you have to find/make some global header for the extern declaration.

I will make these RFC patch series according to our discussion. Thanks you very 
to
valuable advice.

Sorry for seeing your email late.We already worked out some patches to 
restructure
pstore. Would you like to try patchset 
http://article.gmane.org/gmane.linux.kernel/1697680/?

We have more patches available to add some flags to disable/enable specific 
zones.

That's great! I have tried your patches. BTW, Your patches do not work on ARM 
platform,
before I changed linker scripts;


Initially, we just implemented it on x86. It's easy to extend it to ARM. Mostly 
change the arm
vmlinux.lds.S to add the sections. Pls. also change setup_arch to allocate 
memory blocks for
pstore.
In the patchset, there is an example patch, including reserve memory and zone 
examples.
Pls. reference it.


  And can we use this method in modules(I failed to do that)?


It's a good question. There are many approaches to support modules.
1) Define the zone in built-in files and export it.Then, you can use it in 
module.
2) Define the zone and new tracer functions in built-in files and export
the tracer functions.



After a quick glance and try, I think my idea is a little different from yours. 
I will reply you
later.


Pls. Share your opinions. We are improving pstore to make it easier to be used.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Should Pstore(ramoops) records customized information?

2014-06-24 Thread Zhang, Yanmin


On 2014/6/20 18:47, Liu hua wrote:

On 2014/6/20 7:42, Luck, Tony wrote:


BTW, I note that "extern struct pstore_info *psinfo" locates in
fs/pstore/internal.h. So users out of directory "fs/pstore/" can not use pstore 
to
record messages. We do not want other kernel users to use pstore, right?  And 
can we
break this?

Yes we can make some interface visible to the rest of the kernel ... probably
not the raw "*psinfo" though. Perhaps the pstore_alloc_ring_buffer() and
pstore_write_ring_buffer() functions should be the ones exported to the
rest of the kernel.


ditoo.. Since other backends like efi and erst may can not privide such ring 
buffer.
So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct.

Yes - that allows less capable backend like ERST and efivars to not provide the
service.  Since it becomes internal, you can drop the "pstore_" prefix.  E.g.
something like:

int pstore_alloc_ring_buffer(char *name, int size)
{
return psinfo->alloc_ring_buffer(name, size);
}
EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer);

... and you have to find/make some global header for the "extern" declaration.

I will make these RFC patch series according to our discussion. Thanks you very 
to
valuable advice.


Sorry for seeing your email late.We already worked out some patches to 
restructure
pstore. Would you like to try patchset 
http://article.gmane.org/gmane.linux.kernel/1697680/?

We have more patches available to add some flags to disable/enable specific 
zones.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Should Pstore(ramoops) records customized information?

2014-06-24 Thread Zhang, Yanmin


On 2014/6/20 18:47, Liu hua wrote:

On 2014/6/20 7:42, Luck, Tony wrote:


BTW, I note that extern struct pstore_info *psinfo locates in
fs/pstore/internal.h. So users out of directory fs/pstore/ can not use pstore 
to
record messages. We do not want other kernel users to use pstore, right?  And 
can we
break this?

Yes we can make some interface visible to the rest of the kernel ... probably
not the raw *psinfo though. Perhaps the pstore_alloc_ring_buffer() and
pstore_write_ring_buffer() functions should be the ones exported to the
rest of the kernel.


ditoo.. Since other backends like efi and erst may can not privide such ring 
buffer.
So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct.

Yes - that allows less capable backend like ERST and efivars to not provide the
service.  Since it becomes internal, you can drop the pstore_ prefix.  E.g.
something like:

int pstore_alloc_ring_buffer(char *name, int size)
{
return psinfo-alloc_ring_buffer(name, size);
}
EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer);

... and you have to find/make some global header for the extern declaration.

I will make these RFC patch series according to our discussion. Thanks you very 
to
valuable advice.


Sorry for seeing your email late.We already worked out some patches to 
restructure
pstore. Would you like to try patchset 
http://article.gmane.org/gmane.linux.kernel/1697680/?

We have more patches available to add some flags to disable/enable specific 
zones.

Yanmin

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

2014-06-05 Thread Zhang, Yanmin

On 2014/6/5 17:15, Peter Zijlstra wrote:

On Thu, Jun 05, 2014 at 04:00:24PM +0800, Zhang, Yanmin wrote:

On 2014/6/5 15:55, Peter Zijlstra wrote:

Why does pstore cause corruption? I thought that stuff was supposed to
be 'good' ?

pstore is good if the board is reset by WarmReset as memory content is kept
across rebooting.
If it's a ColdReset, memory might lose some or all contents. My board uses
Coldreset, which is a very
fast ColdReset. Most memory content can be kept. But sometimes, some data
has little change.

Oh, I thought pstore was persistent across cold resets or even power
outages. My bad then.


Sorry for misleading you. It depends on backend. Pstore can use RAM or
other storage, for example, EFI-provided backend. If using EFI backend,
pstore can keep persistent across cold reset or even power outrages.

If using RAM, usually pstore can keep persistent at WarmReset. Since RAM
access (even at non-cache mode) is fast and more flexible, in fact, we often use
RAM as backend of pstore.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

2014-06-05 Thread Zhang, Yanmin

On 2014/6/5 15:55, Peter Zijlstra wrote:

On Thu, Jun 05, 2014 at 03:33:21PM +0800, Liu ShuoX wrote:

On Thu  5.Jun'14 at  9:19:19 +0200, Peter Zijlstra wrote:

On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:

From: Zhang Yanmin 

We hit a kernel panic when running perf to collect some performance data.
kenel is x86_64 and user space apps are 32bit.

[   71.965351, 1] [   Binder_2] BUG: unable to handle kernel NULL pointer 
dereference at 0004
[   71.965360, 1] [   Binder_2] IP: [] 
get_segment_base+0x71/0xc0
[   71.965367, 1] [   Binder_2] PGD 6c65f067 PUD 0
[   71.965375, 1] [   Binder_2] Oops:  [#1] PREEMPT SMP
[   71.965413, 1] [   Binder_2] Modules linked in: ddrgx 
snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 
snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 
bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) 
tngdisp bcm4335(O) cfg80211
[   71.965420, 1] [   Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G   
 W  O 3.10.20-263902-g184bfbc-dirty #14
[   71.965426, 1] [   Binder_2] task: 8800764dc300 ti: 88006c6e8000 
task.ti: 88006c6e8000
[   71.965439, 1] [   Binder_2] RIP: 0010:[]  
[] get_segment_base+0x71/0xc0

 ^

[   71.965<44, 1] [   Binder_2] RSP: 0018:^X8007ea87b98  EFLAGS: 
00010092

   ^   ^

[   71.965447, 1] [  !Binder_2] RAX: 0024 RBX:  
RCX: 

  ^

[   71.965450, 1] [   Binder_2] RDX:  RSI:  
RDI: 0009
[   71.965454, 1] [   Binder_2] RBP: 88007ea87ba8 R08: 83143b3c 
R09: 831848a8
[   71.965458, 1] [   Binder_2] R10:  R11: 001bf2d8 
R12: 
[   71.965462, 1] [   Binder_2] R13: 88006c6e9fd8 R14: 88006c6e9f58 
R15: 8800764dc300
[   71.965468, 1_ [   Binder_2] FS:  () 
GS:88007ea8(006b) knlGS:f704add0

 ^

Are you suffering some serious corruption?

The log captured by pstore after rebooting, so there are some
corruption. Please ignore those.

Why does pstore cause corruption? I thought that stuff was supposed to
be 'good' ?
pstore is good if the board is reset by WarmReset as memory content is 
kept across rebooting.
If it's a ColdReset, memory might lose some or all contents. My board 
uses Coldreset, which is a very
fast ColdReset. Most memory content can be kept. But sometimes, some 
data has little change.


Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

2014-06-05 Thread Zhang, Yanmin

On 2014/6/5 15:55, Peter Zijlstra wrote:

On Thu, Jun 05, 2014 at 03:33:21PM +0800, Liu ShuoX wrote:

On Thu  5.Jun'14 at  9:19:19 +0200, Peter Zijlstra wrote:

On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote:

From: Zhang Yanmin yanmin.zh...@intel.com

We hit a kernel panic when running perf to collect some performance data.
kenel is x86_64 and user space apps are 32bit.

[   71.965351, 1] [   Binder_2] BUG: unable to handle kernel NULL pointer 
dereference at 0004
[   71.965360, 1] [   Binder_2] IP: [82012091] 
get_segment_base+0x71/0xc0
[   71.965367, 1] [   Binder_2] PGD 6c65f067 PUD 0
[   71.965375, 1] [   Binder_2] Oops:  [#1] PREEMPT SMP
[   71.965413, 1] [   Binder_2] Modules linked in: ddrgx 
snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 
snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 
bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) 
tngdisp bcm4335(O) cfg80211
[   71.965420, 1] [   Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G   
 W  O 3.10.20-263902-g184bfbc-dirty #14
[   71.965426, 1] [   Binder_2] task: 8800764dc300 ti: 88006c6e8000 
task.ti: 88006c6e8000
[   71.965439, 1] [   Binder_2] RIP: 0010:[82012091]  
[ff?f82012091] get_segment_base+0x71/0xc0

 ^

[   71.96544, 1] [   Binder_2] RSP: 0018:^X8007ea87b98  EFLAGS: 
00010092

   ^   ^

[   71.965447, 1] [  !Binder_2] RAX: 0024 RBX:  
RCX: 

  ^

[   71.965450, 1] [   Binder_2] RDX:  RSI:  
RDI: 0009
[   71.965454, 1] [   Binder_2] RBP: 88007ea87ba8 R08: 83143b3c 
R09: 831848a8
[   71.965458, 1] [   Binder_2] R10:  R11: 001bf2d8 
R12: 
[   71.965462, 1] [   Binder_2] R13: 88006c6e9fd8 R14: 88006c6e9f58 
R15: 8800764dc300
[   71.965468, 1_ [   Binder_2] FS:  () 
GS:88007ea8(006b) knlGS:f704add0

 ^

Are you suffering some serious corruption?

The log captured by pstore after rebooting, so there are some
corruption. Please ignore those.

Why does pstore cause corruption? I thought that stuff was supposed to
be 'good' ?
pstore is good if the board is reset by WarmReset as memory content is 
kept across rebooting.
If it's a ColdReset, memory might lose some or all contents. My board 
uses Coldreset, which is a very
fast ColdReset. Most memory content can be kept. But sometimes, some 
data has little change.


Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs

2014-06-05 Thread Zhang, Yanmin

On 2014/6/5 17:15, Peter Zijlstra wrote:

On Thu, Jun 05, 2014 at 04:00:24PM +0800, Zhang, Yanmin wrote:

On 2014/6/5 15:55, Peter Zijlstra wrote:

Why does pstore cause corruption? I thought that stuff was supposed to
be 'good' ?

pstore is good if the board is reset by WarmReset as memory content is kept
across rebooting.
If it's a ColdReset, memory might lose some or all contents. My board uses
Coldreset, which is a very
fast ColdReset. Most memory content can be kept. But sometimes, some data
has little change.

Oh, I thought pstore was persistent across cold resets or even power
outages. My bad then.


Sorry for misleading you. It depends on backend. Pstore can use RAM or
other storage, for example, EFI-provided backend. If using EFI backend,
pstore can keep persistent across cold reset or even power outrages.

If using RAM, usually pstore can keep persistent at WarmReset. Since RAM
access (even at non-cache mode) is fast and more flexible, in fact, we often use
RAM as backend of pstore.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: linux-next: build failure after merge of the ia64 tree

2014-03-30 Thread Zhang, Yanmin
Sorry. We would fix it ASAP.

Yanmin

-Original Message-
From: Stephen Rothwell [mailto:s...@canb.auug.org.au] 
Sent: Monday, March 31, 2014 8:08 AM
To: Luck, Tony
Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org; Liu, ShuoX; 
Zhang, Yanmin
Subject: linux-next: build failure after merge of the ia64 tree

Hi all,

After merging the ia64 tree, today's linux-next build (x86_64
allmodconfig) failed like this:

fs/built-in.o: In function `pstore_file_open':
inode.c:(.text+0xa29e9): undefined reference to `persistent_ram_size'

Caused by commit f5d7e81eda59 ("pstore: Support current records dump in 
ramoops").

I have used the ia64 tree from next-20140328 for today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: linux-next: build failure after merge of the ia64 tree

2014-03-30 Thread Zhang, Yanmin
Sorry. We would fix it ASAP.

Yanmin

-Original Message-
From: Stephen Rothwell [mailto:s...@canb.auug.org.au] 
Sent: Monday, March 31, 2014 8:08 AM
To: Luck, Tony
Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org; Liu, ShuoX; 
Zhang, Yanmin
Subject: linux-next: build failure after merge of the ia64 tree

Hi all,

After merging the ia64 tree, today's linux-next build (x86_64
allmodconfig) failed like this:

fs/built-in.o: In function `pstore_file_open':
inode.c:(.text+0xa29e9): undefined reference to `persistent_ram_size'

Caused by commit f5d7e81eda59 (pstore: Support current records dump in 
ramoops).

I have used the ia64 tree from next-20140328 for today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pstore: skip zero size persistent ram buffer in traverse

2014-03-02 Thread Zhang, Yanmin

On 2014/2/28 14:38, shuox@intel.com wrote:

From: Liu ShuoX 

In ramoops_pstore_read, a valid prz pointer with zero size buffer will
break traverse of all persistent ram buffers. The latter buffer might
be lost.


Andrew,

Would you like to merge it to your testing tree?
pstore is a very important feature for debugging hard issues on
Android mobiles.

Yanmin



Signed-off-by: Liu ShuoX 
---
  fs/pstore/ram.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index a5d0cab..929ea55 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -119,12 +119,12 @@ ramoops_get_next_prz(struct persistent_ram_zone *przs[], 
uint *c, uint max,
  
  	prz = przs[i];
  
-	if (update) {

-   /* Update old/shadowed buffer. */
+   /* Update old/shadowed buffer. */
+   if (update)
persistent_ram_save_old(prz);
-   if (!persistent_ram_old_size(prz))
-   return NULL;
-   }
+
+   if (!persistent_ram_old_size(prz))
+   return NULL;
  
  	*typep = type;

*id = i;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pstore: reset ftrace_read_cnt at ramoops_pstore_open

2014-03-02 Thread Zhang, Yanmin

On 2014/2/28 14:37, shuox@intel.com wrote:

From: Liu ShuoX 

ftrace_read_cnt need to be reset in open to support mutli times
getting the records.


Andrew,

Would you like to merge it to your testing tree?
pstore is a very important feature for debugging hard issues on
Android mobiles.

Yanmin



Signed-off-by: Liu ShuoX 
---
  fs/pstore/ram.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index fa8cef2..a5d0cab 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -101,6 +101,7 @@ static int ramoops_pstore_open(struct pstore_info *psi)
  
  	cxt->dump_read_cnt = 0;

cxt->console_read_cnt = 0;
+   cxt->ftrace_read_cnt = 0;
return 0;
  }
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pstore: reset ftrace_read_cnt at ramoops_pstore_open

2014-03-02 Thread Zhang, Yanmin

On 2014/2/28 14:37, shuox@intel.com wrote:

From: Liu ShuoX shuox@intel.com

ftrace_read_cnt need to be reset in open to support mutli times
getting the records.


Andrew,

Would you like to merge it to your testing tree?
pstore is a very important feature for debugging hard issues on
Android mobiles.

Yanmin



Signed-off-by: Liu ShuoX shuox@intel.com
---
  fs/pstore/ram.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index fa8cef2..a5d0cab 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -101,6 +101,7 @@ static int ramoops_pstore_open(struct pstore_info *psi)
  
  	cxt-dump_read_cnt = 0;

cxt-console_read_cnt = 0;
+   cxt-ftrace_read_cnt = 0;
return 0;
  }
  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pstore: skip zero size persistent ram buffer in traverse

2014-03-02 Thread Zhang, Yanmin

On 2014/2/28 14:38, shuox@intel.com wrote:

From: Liu ShuoX shuox@intel.com

In ramoops_pstore_read, a valid prz pointer with zero size buffer will
break traverse of all persistent ram buffers. The latter buffer might
be lost.


Andrew,

Would you like to merge it to your testing tree?
pstore is a very important feature for debugging hard issues on
Android mobiles.

Yanmin



Signed-off-by: Liu ShuoX shuox@intel.com
---
  fs/pstore/ram.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index a5d0cab..929ea55 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -119,12 +119,12 @@ ramoops_get_next_prz(struct persistent_ram_zone *przs[], 
uint *c, uint max,
  
  	prz = przs[i];
  
-	if (update) {

-   /* Update old/shadowed buffer. */
+   /* Update old/shadowed buffer. */
+   if (update)
persistent_ram_save_old(prz);
-   if (!persistent_ram_old_size(prz))
-   return NULL;
-   }
+
+   if (!persistent_ram_old_size(prz))
+   return NULL;
  
  	*typep = type;

*id = i;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] klist: del waiter from klist_remove_waiters before wakeup waitting process

2013-05-07 Thread Zhang, Yanmin
Andrew,

Would you like to merge the patch to your MM tree?

Yanmin

>-Original Message-
>From: Peter Zijlstra [mailto:pet...@infradead.org]
>Sent: Friday, May 03, 2013 6:42 PM
>To: Wang, Biao
>Cc: a...@linux-foundation.org; linux-kernel@vger.kernel.org;
>mi...@redhat.com; Zhang, Yanmin; moc...@digitalimplant.org
>Subject: Re: [PATCH] klist: del waiter from klist_remove_waiters before wakeup
>waitting process
>
>On Fri, May 03, 2013 at 03:06:36PM +0800, wangbiao wrote:
>> From: "wang, biao" 
>> Date: Fri, 3 May 2013 14:18:34 +0800
>> Subject: [PATCH] klist: del waiter from klist_remove_waiters before
>> wakeup  waitting process
>>
>> There is a race between klist_remove and klist_release. klist_remove
>> uses a local var waiter saved on stack. When klist_release calls
>> wake_up_process(waiter->process) to wake up the waiter, waiter might
>> run immediately and reuse the stack. Then, klist_release calls
>> list_del(>list) to change previous wait data and cause prior
>> waiter thread corrupt.
>>
>> The patch fixes it against kernel 3.9.
>
>
>I've never seen that code before in my life; but after a quick look you appear 
>to
>be completely right.
>
>> Signed-off-by: wang, biao 
>
>Acked-by: Peter Zijlstra 
>
>> ---
>>  lib/klist.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/lib/klist.c b/lib/klist.c index 0874e41..358a368 100644
>> --- a/lib/klist.c
>> +++ b/lib/klist.c
>> @@ -193,10 +193,10 @@ static void klist_release(struct kref *kref)
>>  if (waiter->node != n)
>>  continue;
>>
>> +list_del(>list);
>>  waiter->woken = 1;
>>  mb();
>>  wake_up_process(waiter->process);
>> -list_del(>list);
>>  }
>>  spin_unlock(_remove_lock);
>>  knode_set_klist(n, NULL);
>> --
>> 1.7.6
>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-26 Thread Zhang, Yanmin
>-Original Message-
>From: Thomas Gleixner [mailto:t...@linutronix.de]
>Sent: Friday, October 26, 2012 5:55 PM
>To: He, Bo
>Cc: linux-kernel@vger.kernel.org; Peter Zijlstra; Ingo Molnar;
>yanmin_zh...@linux.intel.com; Zhang, Yanmin
>Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer
>
>On Fri, 26 Oct 2012, he, bo wrote:
>> From: Yanmin Zhang 
>>
>> We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state !=
>HRTIMER_STATE_CALLBACK).
>> <2>[   10.226053, 3] kernel BUG at
>/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228
>!
>>
>> Basically, __run_hrtimer has a race with enqueue_hrtimer. When
>> __run_hrtimer calls the timer callback fn, another thread might call
>> enqueue_hrtimer or hrtimer_start to requeue it, and the timer->state
>> is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which
>> causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking
>> fails.
>>
>> The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.
>
>This does not fix it. It makes it worse.
Thanks for your kind comments. We found that and sent V2 at 
https://lkml.org/lkml/2012/10/26/172.

>
>> Signed-off-by: Yanmin Zhang 
>> Reviewed-by: He, Bo 
>> ---
>>  kernel/hrtimer.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
>> index 6db7a5e..6280184 100644
>> --- a/kernel/hrtimer.c
>> +++ b/kernel/hrtimer.c
>> @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer,
>ktime_t *now)
>>   * hrtimer_start_range_ns() or in hrtimer_interrupt()
>>   */
>>  if (restart != HRTIMER_NORESTART) {
>> -BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
>> +BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK));
>>  enqueue_hrtimer(timer, base);
>>  }
>
>What you are allowing here is enqueueing an already enqueued timer
>again. I don't know why this does not explode elsewhere, but that's
>probably pure luck. It's not allowed to double enqueue a timer.
You are right.

>
>So no, this is not a solution. The problem is not in the core timer
>code, the problem is in the code which uses that timer.
>
>Your code is returning HRTIMER_RESTART from the timer callback and at
>the same time it starts the timer from some other context. That's what
>needs to be fixed.
The timer user should fix it. But could we also change hrtimer to make it more 
stable?
At least, instead of panic, could we print some information and go ahead to let 
kernel
continue?

Thanks,
Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-26 Thread Zhang, Yanmin
-Original Message-
From: Thomas Gleixner [mailto:t...@linutronix.de]
Sent: Friday, October 26, 2012 5:55 PM
To: He, Bo
Cc: linux-kernel@vger.kernel.org; Peter Zijlstra; Ingo Molnar;
yanmin_zh...@linux.intel.com; Zhang, Yanmin
Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

On Fri, 26 Oct 2012, he, bo wrote:
 From: Yanmin Zhang yanmin.zh...@intel.com

 We hit a kernel panic at __run_hrtimer=BUG_ON(timer-state !=
HRTIMER_STATE_CALLBACK).
 2[   10.226053, 3] kernel BUG at
/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228
!

 Basically, __run_hrtimer has a race with enqueue_hrtimer. When
 __run_hrtimer calls the timer callback fn, another thread might call
 enqueue_hrtimer or hrtimer_start to requeue it, and the timer-state
 is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which
 causes the BUG_ON(timer-state != HRTIMER_STATE_CALLBACK) checking
 fails.

 The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.

This does not fix it. It makes it worse.
Thanks for your kind comments. We found that and sent V2 at 
https://lkml.org/lkml/2012/10/26/172.


 Signed-off-by: Yanmin Zhang yanmin.zh...@intel.com
 Reviewed-by: He, Bo bo...@intel.com
 ---
  kernel/hrtimer.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
 index 6db7a5e..6280184 100644
 --- a/kernel/hrtimer.c
 +++ b/kernel/hrtimer.c
 @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer,
ktime_t *now)
   * hrtimer_start_range_ns() or in hrtimer_interrupt()
   */
  if (restart != HRTIMER_NORESTART) {
 -BUG_ON(timer-state != HRTIMER_STATE_CALLBACK);
 +BUG_ON(!(timer-state  HRTIMER_STATE_CALLBACK));
  enqueue_hrtimer(timer, base);
  }

What you are allowing here is enqueueing an already enqueued timer
again. I don't know why this does not explode elsewhere, but that's
probably pure luck. It's not allowed to double enqueue a timer.
You are right.


So no, this is not a solution. The problem is not in the core timer
code, the problem is in the code which uses that timer.

Your code is returning HRTIMER_RESTART from the timer callback and at
the same time it starts the timer from some other context. That's what
needs to be fixed.
The timer user should fix it. But could we also change hrtimer to make it more 
stable?
At least, instead of panic, could we print some information and go ahead to let 
kernel
continue?

Thanks,
Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Why hold device_lock when calling callback in pci_walk_bus?

2012-09-28 Thread Zhang, Yanmin
Some error handling functions call pci_walk_bus. For example, pci-e aer. Here 
we lock the device, so the driver wouldn't detach from the device, as the cb 
might call driver's callback function.

-Original Message-
From: Huang, Ying 
Sent: Friday, September 28, 2012 4:15 PM
To: bhelg...@google.com
Cc: Greg Kroah-Hartman; Zhang, Yanmin; linux-...@vger.kernel.org; 
linux-kernel@vger.kernel.org; r...@sisk.pl
Subject: Why hold device_lock when calling callback in pci_walk_bus?

Hi, All,

If my understanding were correct, device_lock is used to provide mutual 
exclusion between device probe/remove/suspend/resume etc.  Why hold device_lock 
when calling callback in pci_walk_bus.

This is introduced by the following commit.

commit d71374dafbba7ec3f67371d3b7e9f6310a588808
Author: Zhang Yanmin 
Date:   Fri Jun 2 12:35:43 2006 +0800

[PATCH] PCI: fix race with pci_walk_bus and pci_destroy_dev

pci_walk_bus has a race with pci_destroy_dev. When cb is called
in pci_walk_bus, pci_destroy_dev might unlink the dev pointed by next.
Later on in the next loop, pointer next becomes NULL and cause
kernel panic.

Below patch against 2.6.17-rc4 fixes it by changing pci_bus_lock (spin_lock)
to pci_bus_sem (rw_semaphore).

Signed-off-by: Zhang Yanmin 
Signed-off-by: Greg Kroah-Hartman 

Corresponding email thread is: https://lkml.org/lkml/2006/5/26/38

But from the commit and email thread, I can not find why we need to do that.

I ask this question because I want to use pci_walk_bus in a function (in pci 
runtime resume path) which may be called with device_lock held.

Can anyone help me on that?

Best Regards,
Huang Ying




RE: Why hold device_lock when calling callback in pci_walk_bus?

2012-09-28 Thread Zhang, Yanmin
Some error handling functions call pci_walk_bus. For example, pci-e aer. Here 
we lock the device, so the driver wouldn't detach from the device, as the cb 
might call driver's callback function.

-Original Message-
From: Huang, Ying 
Sent: Friday, September 28, 2012 4:15 PM
To: bhelg...@google.com
Cc: Greg Kroah-Hartman; Zhang, Yanmin; linux-...@vger.kernel.org; 
linux-kernel@vger.kernel.org; r...@sisk.pl
Subject: Why hold device_lock when calling callback in pci_walk_bus?

Hi, All,

If my understanding were correct, device_lock is used to provide mutual 
exclusion between device probe/remove/suspend/resume etc.  Why hold device_lock 
when calling callback in pci_walk_bus.

This is introduced by the following commit.

commit d71374dafbba7ec3f67371d3b7e9f6310a588808
Author: Zhang Yanmin yanmin.zh...@intel.com
Date:   Fri Jun 2 12:35:43 2006 +0800

[PATCH] PCI: fix race with pci_walk_bus and pci_destroy_dev

pci_walk_bus has a race with pci_destroy_dev. When cb is called
in pci_walk_bus, pci_destroy_dev might unlink the dev pointed by next.
Later on in the next loop, pointer next becomes NULL and cause
kernel panic.

Below patch against 2.6.17-rc4 fixes it by changing pci_bus_lock (spin_lock)
to pci_bus_sem (rw_semaphore).

Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com
Signed-off-by: Greg Kroah-Hartman gre...@suse.de

Corresponding email thread is: https://lkml.org/lkml/2006/5/26/38

But from the commit and email thread, I can not find why we need to do that.

I ask this question because I want to use pci_walk_bus in a function (in pci 
runtime resume path) which may be called with device_lock held.

Can anyone help me on that?

Best Regards,
Huang Ying




Re: tbench regression in 2.6.25-rc1

2008-02-20 Thread Zhang, Yanmin
Comparing with kernel 2.6.24, tbench result has regression with
2.6.25-rc1.
1) On 2 quad-core processor stoakley: 4%.
2) On 4 quad-core processor tigerton: more than 30%.

bisect located below patch.

b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit
commit b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b
Author: Herbert Xu <[EMAIL PROTECTED]>
Date:   Tue Nov 13 21:33:32 2007 -0800

[IPV6]: Move nfheader_len into rt6_info

The dst member nfheader_len is only used by IPv6.  It's also currently
creating a rather ugly alignment hole in struct dst.  Therefore this patch
moves it from there into struct rt6_info.

Above patch changes the cache line alignment, especially member __refcnt. I did 
a 
testing by adding 2 unsigned long pading before lastuse, so the 3 members,
lastuse/__refcnt/__use, are moved to next cache line. The performance is 
recovered.

I created a patch to rearrange the members in struct dst_entry.

With Eric and Valdis Kletnieks's suggestion, I made finer arrangement.
1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
tigerton by
moving tclassid to different place. It looks like tclassid could also have 
impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance 
isn't
good. So I move it behind metrics.
2) Add comments before __refcnt.

On 16-core tigerton:
If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch;
If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't introduce 
regression.

Thank Eric, Valdis, and David!

Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>
Acked-by: Eric Dumazet <[EMAIL PROTECTED]>

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
+0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-   u32 metrics[RTAX_MAX];
-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
 
-#ifdef CONFIG_NET_CLS_ROUTE
-   __u32   tclassid;
-#endif
+   struct dst_entry*path;
 
struct neighbour*neighbour;
struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
struct  dst_ops *ops;
-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /*
+* __refcnt wants to be on a different cache line from
+* input/output/ops or performance tanks badly
+*/
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tbench regression in 2.6.25-rc1

2008-02-20 Thread Zhang, Yanmin
Comparing with kernel 2.6.24, tbench result has regression with
2.6.25-rc1.
1) On 2 quad-core processor stoakley: 4%.
2) On 4 quad-core processor tigerton: more than 30%.

bisect located below patch.

b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit
commit b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b
Author: Herbert Xu [EMAIL PROTECTED]
Date:   Tue Nov 13 21:33:32 2007 -0800

[IPV6]: Move nfheader_len into rt6_info

The dst member nfheader_len is only used by IPv6.  It's also currently
creating a rather ugly alignment hole in struct dst.  Therefore this patch
moves it from there into struct rt6_info.

Above patch changes the cache line alignment, especially member __refcnt. I did 
a 
testing by adding 2 unsigned long pading before lastuse, so the 3 members,
lastuse/__refcnt/__use, are moved to next cache line. The performance is 
recovered.

I created a patch to rearrange the members in struct dst_entry.

With Eric and Valdis Kletnieks's suggestion, I made finer arrangement.
1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
tigerton by
moving tclassid to different place. It looks like tclassid could also have 
impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance 
isn't
good. So I move it behind metrics.
2) Add comments before __refcnt.

On 16-core tigerton:
If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch;
If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't introduce 
regression.

Thank Eric, Valdis, and David!

Signed-off-by: Zhang Yanmin [EMAIL PROTECTED]
Acked-by: Eric Dumazet [EMAIL PROTECTED]

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
+0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-   u32 metrics[RTAX_MAX];
-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
 
-#ifdef CONFIG_NET_CLS_ROUTE
-   __u32   tclassid;
-#endif
+   struct dst_entry*path;
 
struct neighbour*neighbour;
struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
struct  dst_ops *ops;
-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /*
+* __refcnt wants to be on a different cache line from
+* input/output/ops or performance tanks badly
+*/
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tbench regression in 2.6.25-rc1

2008-02-19 Thread Zhang, Yanmin
On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote:
> Zhang, Yanmin a �crit :
> > On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 
> >> On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said:
> >>
> >>> I also think __refcnt is the key. I did a new testing by adding 2 
> >>> unsigned long
> >>> pading before lastuse, so the 3 members are moved to next cache line. The 
> >>> performance is
> >>> recovered.
> >>>
> >>> How about below patch? Almost all performance is recovered with the new 
> >>> patch.
> >>>
> >>> Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>
> >> Could you add a comment someplace that says "refcnt wants to be on a 
> >> different
> >> cache line from input/output/ops or performance tanks badly", to warn some
> >> future kernel hacker who starts adding new fields to the structure?
> > Ok. Below is the new patch.
> > 
> > 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
> > sizeof(dst_entry)=200
> > no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
> > tigerton by
> > moving tclassid to different place. It looks like tclassid could also have 
> > impact on
> > performance.
> > If moving tclassid before metrics, or just don't move tclassid, the 
> > performance isn't
> > good. So I move it behind metrics.
> > 
> > 2) Add comments before __refcnt.
> > 
> > If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better 
> > than
> > the one without the patch.
> > 
> > If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better 
> > than
> > the one without the patch.
> > 
> > Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>
> > 
> > ---
> > 
> > --- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 
> > +0800
> > +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
> > +0800
> > @@ -52,15 +52,10 @@ struct dst_entry
> > unsigned short  header_len; /* more space at head required 
> > */
> > unsigned short  trailer_len;/* space to reserve at tail */
> >  
> > -   u32 metrics[RTAX_MAX];
> > -   struct dst_entry*path;
> > -
> > -   unsigned long   rate_last;  /* rate limiting for ICMP */
> > unsigned intrate_tokens;
> > +   unsigned long   rate_last;  /* rate limiting for ICMP */
> >  
> > -#ifdef CONFIG_NET_CLS_ROUTE
> > -   __u32   tclassid;
> > -#endif
> > +   struct dst_entry*path;
> >  
> > struct neighbour*neighbour;
> > struct hh_cache *hh;
> > @@ -70,10 +65,20 @@ struct dst_entry
> > int (*output)(struct sk_buff*);
> >  
> > struct  dst_ops *ops;
> > -   
> > -   unsigned long   lastuse;
> > +
> > +   u32 metrics[RTAX_MAX];
> > +
> > +#ifdef CONFIG_NET_CLS_ROUTE
> > +   __u32   tclassid;
> > +#endif
> > +
> > +   /*
> > +* __refcnt wants to be on a different cache line from
> > +* input/output/ops or performance tanks badly
> > +*/
> > atomic_t__refcnt;   /* client references*/
> > int __use;
> > +   unsigned long   lastuse;
> > union {
> > struct dst_entry *next;
> > struct rtable*rt_next;
> > 
> > 
> > 
> 
> I prefer this patch, but unfortunatly your perf numbers are for 64 bits 
> kernels.
> 
> Could you please test now with 32 bits one ?
I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no 
difference
between pure kernel and patched kernel.

New update: On 8-core stoakley, the regression becomes 2~3% with kernel 
2.6.25-rc2. On
tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 
cores+hyperthreading),
the regression is still 4% with 2.6.25-rc2.

With my patch, on tigerton, almost all regression disappears. On tulsa, only 
about 2%
regression disappears.

So this issue is triggerred with multiple-cpu. Perhaps process scheduler is 
another
factor causing the issue to happen, but it's very hard to change scheduler.


Eric,

I tested your new patch in function loopback_xmit. It has no improvement, while 
it doesn't
introduce new issues. As you tested it on dual-core machine and got 
improvement, how about
merging your patch with mine?

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >