RE: [PATCH] sched: don't output cpu sched info by default
I double-check it. Function show_state_filter chooses to print all tasks when state_filter == 0 (TASK_RUNNING). sysrq_sched_debug_show prints per cpu sched debug info. They are different info. It is better to use sched_debug_enabled to control the 2nd part output. Yanmin -Original Message- From: Zhang, Yanmin Sent: Tuesday, April 26, 2016 9:04 AM To: Afzal Mohammed; Zhang, LongX Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org Subject: RE: [PATCH] sched: don't output cpu sched info by default Thanks for the pointer. We didn't notice that. That patch also fixes the issue indeed. Yanmin -Original Message- From: Afzal Mohammed [mailto:afzal.mohd...@gmail.com] Sent: Monday, April 25, 2016 6:50 PM To: Zhang, LongX Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org; Zhang, Yanmin Subject: Re: [PATCH] sched: don't output cpu sched info by default Hi, On Mon, Apr 25, 2016 at 01:04:41PM +0800, Zhang Long wrote: > Android userspace debug prcoess might dump system info by sysrq. > One info is of cpu sched. Usually, one thread has one line dump. > Such log is huge sometimes and such dumping spends lots of time and > make the system worse. Sometimes, watchdog resets the system in the > end. > > The patch fixes it by dumping cpu sched info only when > sched_debug_enabled is true. Doesn't "sched: don't dump sched debug info in SysRq-W" currently in tip/sched/core help ?, http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vinc...@axis.com Regards afzal
RE: [PATCH] sched: don't output cpu sched info by default
I double-check it. Function show_state_filter chooses to print all tasks when state_filter == 0 (TASK_RUNNING). sysrq_sched_debug_show prints per cpu sched debug info. They are different info. It is better to use sched_debug_enabled to control the 2nd part output. Yanmin -Original Message- From: Zhang, Yanmin Sent: Tuesday, April 26, 2016 9:04 AM To: Afzal Mohammed; Zhang, LongX Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org Subject: RE: [PATCH] sched: don't output cpu sched info by default Thanks for the pointer. We didn't notice that. That patch also fixes the issue indeed. Yanmin -Original Message- From: Afzal Mohammed [mailto:afzal.mohd...@gmail.com] Sent: Monday, April 25, 2016 6:50 PM To: Zhang, LongX Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org; Zhang, Yanmin Subject: Re: [PATCH] sched: don't output cpu sched info by default Hi, On Mon, Apr 25, 2016 at 01:04:41PM +0800, Zhang Long wrote: > Android userspace debug prcoess might dump system info by sysrq. > One info is of cpu sched. Usually, one thread has one line dump. > Such log is huge sometimes and such dumping spends lots of time and > make the system worse. Sometimes, watchdog resets the system in the > end. > > The patch fixes it by dumping cpu sched info only when > sched_debug_enabled is true. Doesn't "sched: don't dump sched debug info in SysRq-W" currently in tip/sched/core help ?, http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vinc...@axis.com Regards afzal
RE: [PATCH] sched: don't output cpu sched info by default
Thanks for the pointer. We didn't notice that. That patch also fixes the issue indeed. Yanmin -Original Message- From: Afzal Mohammed [mailto:afzal.mohd...@gmail.com] Sent: Monday, April 25, 2016 6:50 PM To: Zhang, LongX Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org; Zhang, Yanmin Subject: Re: [PATCH] sched: don't output cpu sched info by default Hi, On Mon, Apr 25, 2016 at 01:04:41PM +0800, Zhang Long wrote: > Android userspace debug prcoess might dump system info by sysrq. > One info is of cpu sched. Usually, one thread has one line dump. > Such log is huge sometimes and such dumping spends lots of time and > make the system worse. Sometimes, watchdog resets the system in the > end. > > The patch fixes it by dumping cpu sched info only when > sched_debug_enabled is true. Doesn't "sched: don't dump sched debug info in SysRq-W" currently in tip/sched/core help ?, http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vinc...@axis.com Regards afzal
RE: [PATCH] sched: don't output cpu sched info by default
Thanks for the pointer. We didn't notice that. That patch also fixes the issue indeed. Yanmin -Original Message- From: Afzal Mohammed [mailto:afzal.mohd...@gmail.com] Sent: Monday, April 25, 2016 6:50 PM To: Zhang, LongX Cc: mi...@redhat.com; pet...@infradead.org; linux-kernel@vger.kernel.org; Zhang, Yanmin Subject: Re: [PATCH] sched: don't output cpu sched info by default Hi, On Mon, Apr 25, 2016 at 01:04:41PM +0800, Zhang Long wrote: > Android userspace debug prcoess might dump system info by sysrq. > One info is of cpu sched. Usually, one thread has one line dump. > Such log is huge sometimes and such dumping spends lots of time and > make the system worse. Sometimes, watchdog resets the system in the > end. > > The patch fixes it by dumping cpu sched info only when > sched_debug_enabled is true. Doesn't "sched: don't dump sched debug info in SysRq-W" currently in tip/sched/core help ?, http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vinc...@axis.com Regards afzal
Re: [PATCH] trace: correct start_index in find_next
On 2016/1/7 11:50, Steven Rostedt wrote: > On Thu, 07 Jan 2016 10:56:56 +0800 > "Zhang, Yanmin" wrote: > >> How is this patch? It fixes a kernel panic. >> > Linus already pulled it. It's also marked for stable. See commit: > f36d1be2930ede0a1947686e1126ffda5d5ee1bb in Linus's tree. > > I renamed the subject slightly to: > > "tracing: Fix setting of start_index in find_next()" > Thanks as usual. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace: correct start_index in find_next
On 2015/12/31 13:11, Qiu, PeiyangX wrote: > From: Qiu Peiyang > > When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel > panic at t_show. > > general protection fault: [#1] PREEMPT SMP > CPU: 0 PID: 2957 Comm: sh Tainted: G W O 3.14.55-x86_64-01062-gd4acdc7 #2 > RIP: 0010:[] > [] t_show+0x22/0xe0 > RSP: :88002b4ebe80 EFLAGS: 00010246 > RAX: RBX: RCX: 0004 > RDX: 0004 RSI: 81fd26a6 RDI: 880032f9f7b1 > RBP: 88002b4ebe98 R08: 1000 R09: ffec > R10: R11: 000f R12: 880004d9b6c0 > R13: 7365725f6d706400 R14: 880004d9b6c0 R15: 82020570 > FS: () GS:88003aa0(0063) knlGS:f776bc40 > CS: 0010 DS: 002b ES: 002b CR0: 80050033 > CR2: f6c02ff0 CR3: 2c2b3000 CR4: 001007f0 > Call Trace: > [] seq_read+0x2f6/0x3e0 > [] vfs_read+0x9b/0x160 > [] SyS_read+0x49/0xb0 > [] ia32_do_call+0x13/0x13 > ---[ end trace 5bd9eb630614861e ]--- > Kernel panic - not syncing: Fatal exception > > When the first time find_next calls find_next_mod_format, it should > iterate the trace_bprintk_fmt_list to find the first print format of > the module. However in current code, start_index is smaller than *pos > at first, and code will not iterate the list. Latter container_of will > get the wrong address with former v, which will cause mod_fmt be a > meaningless object and so is the returned mod_fmt->fmt. > > This patch will fix it by correcting the start_index. After fixed, > when the first time calls find_next_mod_format, start_index will be > equal to *pos, and code will iterate the trace_bprintk_fmt_list to > get the right module printk format, so is the returned mod_fmt->fmt. > > Signed-off-by: Qiu Peiyang > --- > kernel/trace/trace_printk.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c > index 1c2b285..64f0a26 100644 > --- a/kernel/trace/trace_printk.c > +++ b/kernel/trace/trace_printk.c > @@ -273,6 +273,7 @@ static const char **find_next(void *v, loff_t *pos) > if (*pos < last_index + start_index) > return __start___tracepoint_str + (*pos - last_index); > > + start_index += last_index; > return find_next_mod_format(start_index, v, fmt, pos); > } Rusty, Steven, How is this patch? It fixes a kernel panic. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace: correct start_index in find_next
On 2015/12/31 13:11, Qiu, PeiyangX wrote: > From: Qiu Peiyang> > When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel > panic at t_show. > > general protection fault: [#1] PREEMPT SMP > CPU: 0 PID: 2957 Comm: sh Tainted: G W O 3.14.55-x86_64-01062-gd4acdc7 #2 > RIP: 0010:[] > [] t_show+0x22/0xe0 > RSP: :88002b4ebe80 EFLAGS: 00010246 > RAX: RBX: RCX: 0004 > RDX: 0004 RSI: 81fd26a6 RDI: 880032f9f7b1 > RBP: 88002b4ebe98 R08: 1000 R09: ffec > R10: R11: 000f R12: 880004d9b6c0 > R13: 7365725f6d706400 R14: 880004d9b6c0 R15: 82020570 > FS: () GS:88003aa0(0063) knlGS:f776bc40 > CS: 0010 DS: 002b ES: 002b CR0: 80050033 > CR2: f6c02ff0 CR3: 2c2b3000 CR4: 001007f0 > Call Trace: > [] seq_read+0x2f6/0x3e0 > [] vfs_read+0x9b/0x160 > [] SyS_read+0x49/0xb0 > [] ia32_do_call+0x13/0x13 > ---[ end trace 5bd9eb630614861e ]--- > Kernel panic - not syncing: Fatal exception > > When the first time find_next calls find_next_mod_format, it should > iterate the trace_bprintk_fmt_list to find the first print format of > the module. However in current code, start_index is smaller than *pos > at first, and code will not iterate the list. Latter container_of will > get the wrong address with former v, which will cause mod_fmt be a > meaningless object and so is the returned mod_fmt->fmt. > > This patch will fix it by correcting the start_index. After fixed, > when the first time calls find_next_mod_format, start_index will be > equal to *pos, and code will iterate the trace_bprintk_fmt_list to > get the right module printk format, so is the returned mod_fmt->fmt. > > Signed-off-by: Qiu Peiyang > --- > kernel/trace/trace_printk.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c > index 1c2b285..64f0a26 100644 > --- a/kernel/trace/trace_printk.c > +++ b/kernel/trace/trace_printk.c > @@ -273,6 +273,7 @@ static const char **find_next(void *v, loff_t *pos) > if (*pos < last_index + start_index) > return __start___tracepoint_str + (*pos - last_index); > > + start_index += last_index; > return find_next_mod_format(start_index, v, fmt, pos); > } Rusty, Steven, How is this patch? It fixes a kernel panic. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace: correct start_index in find_next
On 2016/1/7 11:50, Steven Rostedt wrote: > On Thu, 07 Jan 2016 10:56:56 +0800 > "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote: > >> How is this patch? It fixes a kernel panic. >> > Linus already pulled it. It's also marked for stable. See commit: > f36d1be2930ede0a1947686e1126ffda5d5ee1bb in Linus's tree. > > I renamed the subject slightly to: > > "tracing: Fix setting of start_index in find_next()" > Thanks as usual. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] module: deal with the failure of complete_formation
On 2016/1/6 9:29, Steven Rostedt wrote: > On Wed, 06 Jan 2016 09:14:42 +0800 > "Zhang, Yanmin" wrote: > > >> It's a good idea although ftrace_module_init_fail might be complicated. > How so? I posted the patch. All it does is to call > ftrace_release_mod(), which will only remove pages that are included > with the module. If the pages were not added yet, nothing is done. You are right. ftrace_release_mod can be used. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] module: deal with the failure of complete_formation
On 2016/1/6 9:01, Steven Rostedt wrote: > On Fri, 25 Dec 2015 15:03:13 +0800 > "Qiu, PeiyangX" wrote: > >> From: Qiu Peiyang >> >> complete_formation might fail. kernel need clean up >> ftrace records of the module. >> >> The patch fixes it by tuning the operation sequence in >> complete_formation. After complete_formation checks >> verify_export_symbols, call ftrace_module_init to init >> ftrace records. >> >> Signed-off-by: Qiu Peiyang >> Signed-off-by: Zhang Yanmin >> --- >> kernel/module.c | 9 + >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/module.c b/kernel/module.c >> index 8f051a1..0a67c4e 100644 >> --- a/kernel/module.c >> +++ b/kernel/module.c >> @@ -3373,6 +3373,11 @@ static int complete_formation(struct module *mod, >> struct load_info *info) >> /* This relies on module_mutex for list integrity. */ >> module_bug_finalize(info->hdr, info->sechdrs, mod); >> >> +mutex_unlock(_mutex); >> + > First of all, this is buggy. You can't just move the locking of > module_mutex. That is needed to modify mod->state. > > Second of all, you are solving this wrong. I guess we should add > ftrace_module_init_fail() function to cover the cases where the module > can fail before calling do_init_module(), as if that happens it does > not call the module going notifiers. It's a good idea although ftrace_module_init_fail might be complicated. Thanks, Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] module: deal with the failure of complete_formation
On 2016/1/6 9:01, Steven Rostedt wrote: > On Fri, 25 Dec 2015 15:03:13 +0800 > "Qiu, PeiyangX" <peiyangx@intel.com> wrote: > >> From: Qiu Peiyang <peiyangx@intel.com> >> >> complete_formation might fail. kernel need clean up >> ftrace records of the module. >> >> The patch fixes it by tuning the operation sequence in >> complete_formation. After complete_formation checks >> verify_export_symbols, call ftrace_module_init to init >> ftrace records. >> >> Signed-off-by: Qiu Peiyang <peiyangx@intel.com> >> Signed-off-by: Zhang Yanmin <yanmin.zh...@intel.com> >> --- >> kernel/module.c | 9 + >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/module.c b/kernel/module.c >> index 8f051a1..0a67c4e 100644 >> --- a/kernel/module.c >> +++ b/kernel/module.c >> @@ -3373,6 +3373,11 @@ static int complete_formation(struct module *mod, >> struct load_info *info) >> /* This relies on module_mutex for list integrity. */ >> module_bug_finalize(info->hdr, info->sechdrs, mod); >> >> +mutex_unlock(_mutex); >> + > First of all, this is buggy. You can't just move the locking of > module_mutex. That is needed to modify mod->state. > > Second of all, you are solving this wrong. I guess we should add > ftrace_module_init_fail() function to cover the cases where the module > can fail before calling do_init_module(), as if that happens it does > not call the module going notifiers. It's a good idea although ftrace_module_init_fail might be complicated. Thanks, Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] module: deal with the failure of complete_formation
On 2016/1/6 9:29, Steven Rostedt wrote: > On Wed, 06 Jan 2016 09:14:42 +0800 > "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote: > > >> It's a good idea although ftrace_module_init_fail might be complicated. > How so? I posted the patch. All it does is to call > ftrace_release_mod(), which will only remove pages that are included > with the module. If the pages were not added yet, nothing is done. You are right. ftrace_release_mod can be used. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/16 22:28, Steven Rostedt wrote: > On Wed, 16 Dec 2015 18:28:35 +0800 > "Zhang, Yanmin" wrote: > >>> + /* >>> +* If the tracing is enabled, go ahead and enable the record. >>> +* >>> +* The reason not to enable the record immediatelly is the >>> +* inherent check of ftrace_make_nop/ftrace_make_call for >>> +* correct previous instructions. Making first the NOP >>> +* conversion puts the module to the correct state, thus >>> +* passing the ftrace_make_call check. >>> +* >>> +* We also delay this to after the module code already set the >>> +* text to read-only, as we now need to set it back to read-write >>> +* so that we can modify the text. >>> +*/ >>> + if (ftrace_start_up) >>> + ftrace_arch_code_modify_prepare(); >>> + >>> + do_for_each_ftrace_rec(pg, rec) { >>> + int cnt; >>> + /* >>> +* do_for_each_ftrace_rec() is a double loop. >>> +* module text shares the pg. If a record is >>> +* not part of this module, then skip this pg, >>> +* which the "break" will do. >>> +*/ >>> + if (!within_module_core(rec->ip, mod)) >>> + break; >>> + >>> + cnt = 0; >>> + >>> + /* >>> +* When adding a module, we need to check if tracers are >>> +* currently enabled and if they are, and can trace this record, >>> +* we need to enable the module functions as well as update the >>> +* reference counts for those function records. >>> +*/ >>> + if (ftrace_start_up) >>> + cnt += referenced_filters(rec); >>> + >>> + /* This clears FTRACE_FL_DISABLED */ >>> + rec->flags = cnt; >>> + >>> + if (ftrace_start_up && cnt) { >>> + int failed = __ftrace_replace_code(rec, 1); >> If we choose to call ftrace_module_enable when receiving module notification >> MODULE_STATE_COMING, TEXT section of the module is already changed to RO. > And that's why we call ftrace_arch_code_modify_prepare(). That should > change all text to RW. Thanks for the kind pointer. We would add codes into your patch based on notifier and send patch to you by private email. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/16 1:37, Steven Rostedt wrote: > On Tue, 15 Dec 2015 11:26:41 +0800 > "Zhang, Yanmin" wrote: > >>> This seems very hackish, although I can't think of a better way at the >>> moment. But I would like not to add more code into module.c if >>> possible, and just use a notifier unless there's a real reason we can't >>> (as there was when we added the hook in the first place). >>> We would double-check it again. It's not a good idea to change common >>> module codes. >> I double-checked it. If using notifier, we have to add 2 new module >> notification >> events in enum module_state. >> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED. >> At MODULE_STATE_INITING, call ftrace_module_init(mod); >> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar >> function); >> At MODULE_STATE_COMING, call a new function which links the new start_pg >> to ftrace_pages. >> Such design can also fix another bug in current kernel that ftrace start_pg >> of the module is not cleaned up if complete_formation fails. >> However, we change module common codes. The new events only serve ftrace. >> >> If not holding ftrace_mutex across start_pg setup and complete_formation, >> some other variables are not protected well, such like ftrace_pages, >> ftrace_start_up, and so on. > OK, that was basically the reason the hook was added in the first place. > > The reason the modules notifier doesn't work here is that there's too > many exits from the module code that may leave the mutex held. > > I thought about this some more. So the issue is that we add the module > functions to the ftrace records. Then another task enables function > tracing before the module is fully set up. As ftrace is enabling > function tracing, the module sets its text to read-only, outside the > scope of ftrace setting all text to read-write. When ftrace gets to the > module code, it fails to do the modifications and you get the > ftrace_bug() splat. > > Sounds right? > > Now, I really dislike the taking of the ftrace mutex and returning back > to module handling. That is very subtle, fragile and prone to bugs. I > took a step back to find another solution and I think I found one. > > Take the below patch and add to it (I'll keep this patch as mine, and > you can submit another patch to do the following). You don't need to > resend this patch, just send me a patch that does this: > > > Add the module notifier that calls ftrace_module_enable(mod), removing > it from ftrace_init_module(). Feel free to monkey with that function to > be able to be called directly if it can't already. > > What my patch does is to create a new ftrace record flag DISABLED, > which is set to all functions in a module as it is added. Then later on > (in the module notifier that you will add), the flag is cleared. In > between, the module functions will be ignored. > > > If the module errors out and the notifier is not called, we don't care. > The records will simply be removed, and the flag will be meaningless. > > I needed to move the code around to still let the module code get > enabled if tracing was already enabled. I even cleaned that code up as > I found it has gotten a bit messy. > > -- Steve > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h > index 4736a826baf5..660e7c698f3b 100644 > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -357,6 +357,7 @@ bool is_ftrace_trampoline(unsigned long addr); > * REGS- the record wants the function to save regs > * REGS_EN - the function is set up to save regs. > * IPMODIFY - the record allows for the IP address to be changed. > + * DISABLED - the record is not ready to be touched yet > * > * When a new ftrace_ops is registered and wants a function to save > * pt_regs, the rec->flag REGS is set. When the function has been > @@ -371,10 +372,11 @@ enum { > FTRACE_FL_TRAMP = (1UL << 28), > FTRACE_FL_TRAMP_EN = (1UL << 27), > FTRACE_FL_IPMODIFY = (1UL << 26), > + FTRACE_FL_DISABLED = (1UL << 25), > }; > > -#define FTRACE_REF_MAX_SHIFT 26 > -#define FTRACE_FL_BITS 6 > +#define FTRACE_REF_MAX_SHIFT 25 > +#define FTRACE_FL_BITS 7 > #define FTRACE_FL_MASKED_BITS((1UL << FTRACE_FL_BITS) - 1) > #define FTRACE_FL_MASK (FTRACE_FL_MASKED_BITS << > FTRACE_REF_MAX_SHIFT) > #define FTRACE_REF_MAX ((1UL << FTRACE_REF_MAX_SHIFT) - 1) > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index e290a30f2d0b..e7e15874fb7c 1
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/16 22:28, Steven Rostedt wrote: > On Wed, 16 Dec 2015 18:28:35 +0800 > "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote: > >>> + /* >>> +* If the tracing is enabled, go ahead and enable the record. >>> +* >>> +* The reason not to enable the record immediatelly is the >>> +* inherent check of ftrace_make_nop/ftrace_make_call for >>> +* correct previous instructions. Making first the NOP >>> +* conversion puts the module to the correct state, thus >>> +* passing the ftrace_make_call check. >>> +* >>> +* We also delay this to after the module code already set the >>> +* text to read-only, as we now need to set it back to read-write >>> +* so that we can modify the text. >>> +*/ >>> + if (ftrace_start_up) >>> + ftrace_arch_code_modify_prepare(); >>> + >>> + do_for_each_ftrace_rec(pg, rec) { >>> + int cnt; >>> + /* >>> +* do_for_each_ftrace_rec() is a double loop. >>> +* module text shares the pg. If a record is >>> +* not part of this module, then skip this pg, >>> +* which the "break" will do. >>> +*/ >>> + if (!within_module_core(rec->ip, mod)) >>> + break; >>> + >>> + cnt = 0; >>> + >>> + /* >>> +* When adding a module, we need to check if tracers are >>> +* currently enabled and if they are, and can trace this record, >>> +* we need to enable the module functions as well as update the >>> +* reference counts for those function records. >>> +*/ >>> + if (ftrace_start_up) >>> + cnt += referenced_filters(rec); >>> + >>> + /* This clears FTRACE_FL_DISABLED */ >>> + rec->flags = cnt; >>> + >>> + if (ftrace_start_up && cnt) { >>> + int failed = __ftrace_replace_code(rec, 1); >> If we choose to call ftrace_module_enable when receiving module notification >> MODULE_STATE_COMING, TEXT section of the module is already changed to RO. > And that's why we call ftrace_arch_code_modify_prepare(). That should > change all text to RW. Thanks for the kind pointer. We would add codes into your patch based on notifier and send patch to you by private email. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/16 1:37, Steven Rostedt wrote: > On Tue, 15 Dec 2015 11:26:41 +0800 > "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote: > >>> This seems very hackish, although I can't think of a better way at the >>> moment. But I would like not to add more code into module.c if >>> possible, and just use a notifier unless there's a real reason we can't >>> (as there was when we added the hook in the first place). >>> We would double-check it again. It's not a good idea to change common >>> module codes. >> I double-checked it. If using notifier, we have to add 2 new module >> notification >> events in enum module_state. >> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED. >> At MODULE_STATE_INITING, call ftrace_module_init(mod); >> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar >> function); >> At MODULE_STATE_COMING, call a new function which links the new start_pg >> to ftrace_pages. >> Such design can also fix another bug in current kernel that ftrace start_pg >> of the module is not cleaned up if complete_formation fails. >> However, we change module common codes. The new events only serve ftrace. >> >> If not holding ftrace_mutex across start_pg setup and complete_formation, >> some other variables are not protected well, such like ftrace_pages, >> ftrace_start_up, and so on. > OK, that was basically the reason the hook was added in the first place. > > The reason the modules notifier doesn't work here is that there's too > many exits from the module code that may leave the mutex held. > > I thought about this some more. So the issue is that we add the module > functions to the ftrace records. Then another task enables function > tracing before the module is fully set up. As ftrace is enabling > function tracing, the module sets its text to read-only, outside the > scope of ftrace setting all text to read-write. When ftrace gets to the > module code, it fails to do the modifications and you get the > ftrace_bug() splat. > > Sounds right? > > Now, I really dislike the taking of the ftrace mutex and returning back > to module handling. That is very subtle, fragile and prone to bugs. I > took a step back to find another solution and I think I found one. > > Take the below patch and add to it (I'll keep this patch as mine, and > you can submit another patch to do the following). You don't need to > resend this patch, just send me a patch that does this: > > > Add the module notifier that calls ftrace_module_enable(mod), removing > it from ftrace_init_module(). Feel free to monkey with that function to > be able to be called directly if it can't already. > > What my patch does is to create a new ftrace record flag DISABLED, > which is set to all functions in a module as it is added. Then later on > (in the module notifier that you will add), the flag is cleared. In > between, the module functions will be ignored. > > > If the module errors out and the notifier is not called, we don't care. > The records will simply be removed, and the flag will be meaningless. > > I needed to move the code around to still let the module code get > enabled if tracing was already enabled. I even cleaned that code up as > I found it has gotten a bit messy. > > -- Steve > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h > index 4736a826baf5..660e7c698f3b 100644 > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -357,6 +357,7 @@ bool is_ftrace_trampoline(unsigned long addr); > * REGS- the record wants the function to save regs > * REGS_EN - the function is set up to save regs. > * IPMODIFY - the record allows for the IP address to be changed. > + * DISABLED - the record is not ready to be touched yet > * > * When a new ftrace_ops is registered and wants a function to save > * pt_regs, the rec->flag REGS is set. When the function has been > @@ -371,10 +372,11 @@ enum { > FTRACE_FL_TRAMP = (1UL << 28), > FTRACE_FL_TRAMP_EN = (1UL << 27), > FTRACE_FL_IPMODIFY = (1UL << 26), > + FTRACE_FL_DISABLED = (1UL << 25), > }; > > -#define FTRACE_REF_MAX_SHIFT 26 > -#define FTRACE_FL_BITS 6 > +#define FTRACE_REF_MAX_SHIFT 25 > +#define FTRACE_FL_BITS 7 > #define FTRACE_FL_MASKED_BITS((1UL << FTRACE_FL_BITS) - 1) > #define FTRACE_FL_MASK (FTRACE_FL_MASKED_BITS << > FTRACE_REF_MAX_SHIFT) > #define FTRACE_REF_MAX ((1UL << FTRACE_REF_MAX_SHIFT) - 1) > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c >
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/16 1:37, Steven Rostedt wrote: > On Tue, 15 Dec 2015 11:26:41 +0800 > "Zhang, Yanmin" wrote: > >>> This seems very hackish, although I can't think of a better way at the >>> moment. But I would like not to add more code into module.c if >>> possible, and just use a notifier unless there's a real reason we can't >>> (as there was when we added the hook in the first place). >>> We would double-check it again. It's not a good idea to change common >>> module codes. >> I double-checked it. If using notifier, we have to add 2 new module >> notification >> events in enum module_state. >> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED. >> At MODULE_STATE_INITING, call ftrace_module_init(mod); >> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar >> function); >> At MODULE_STATE_COMING, call a new function which links the new start_pg >> to ftrace_pages. >> Such design can also fix another bug in current kernel that ftrace start_pg >> of the module is not cleaned up if complete_formation fails. >> However, we change module common codes. The new events only serve ftrace. >> >> If not holding ftrace_mutex across start_pg setup and complete_formation, >> some other variables are not protected well, such like ftrace_pages, >> ftrace_start_up, and so on. > OK, that was basically the reason the hook was added in the first place. > > The reason the modules notifier doesn't work here is that there's too > many exits from the module code that may leave the mutex held. Yes. > > I thought about this some more. So the issue is that we add the module > functions to the ftrace records. Then another task enables function > tracing before the module is fully set up. As ftrace is enabling > function tracing, the module sets its text to read-only, outside the > scope of ftrace setting all text to read-write. When ftrace gets to the > module code, it fails to do the modifications and you get the > ftrace_bug() splat. > > Sounds right? Right, indeed. My case is worse than that as my android uses kernel 3.14.50, which has another bug around remove_breakpoint as it calls probe_kernel_write directly. The latest upstream has no such issue. So we just need focus on the race you explain well. > > Now, I really dislike the taking of the ftrace mutex and returning back > to module handling. That is very subtle, fragile and prone to bugs. Yes, just like that the current codes are not perfect. :) > I > took a step back to find another solution and I think I found one. > > Take the below patch and add to it (I'll keep this patch as mine, and > you can submit another patch to do the following). You don't need to > resend this patch, just send me a patch that does this: Ok. > > > Add the module notifier that calls ftrace_module_enable(mod), removing > it from ftrace_init_module(). Feel free to monkey with that function to > be able to be called directly if it can't already. > > What my patch does is to create a new ftrace record flag DISABLED, > which is set to all functions in a module as it is added. Then later on > (in the module notifier that you will add), the flag is cleared. In > between, the module functions will be ignored. It's a good idea. > > If the module errors out and the notifier is not called, we don't care. > The records will simply be removed, and the flag will be meaningless. One question: if complete_formation fails, load_module wouldn't send any notification and ftrace_release_mod is not called. How can ftrace core remove all pg->records of the module? If complete_formation succeeds, further calls cause module errors out, load_module would send MODULE_STATE_GOING, ftrace_release_mod is called. Thanks for your quick response. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/16 1:37, Steven Rostedt wrote: > On Tue, 15 Dec 2015 11:26:41 +0800 > "Zhang, Yanmin" <yanmin_zh...@linux.intel.com> wrote: > >>> This seems very hackish, although I can't think of a better way at the >>> moment. But I would like not to add more code into module.c if >>> possible, and just use a notifier unless there's a real reason we can't >>> (as there was when we added the hook in the first place). >>> We would double-check it again. It's not a good idea to change common >>> module codes. >> I double-checked it. If using notifier, we have to add 2 new module >> notification >> events in enum module_state. >> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED. >> At MODULE_STATE_INITING, call ftrace_module_init(mod); >> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar >> function); >> At MODULE_STATE_COMING, call a new function which links the new start_pg >> to ftrace_pages. >> Such design can also fix another bug in current kernel that ftrace start_pg >> of the module is not cleaned up if complete_formation fails. >> However, we change module common codes. The new events only serve ftrace. >> >> If not holding ftrace_mutex across start_pg setup and complete_formation, >> some other variables are not protected well, such like ftrace_pages, >> ftrace_start_up, and so on. > OK, that was basically the reason the hook was added in the first place. > > The reason the modules notifier doesn't work here is that there's too > many exits from the module code that may leave the mutex held. Yes. > > I thought about this some more. So the issue is that we add the module > functions to the ftrace records. Then another task enables function > tracing before the module is fully set up. As ftrace is enabling > function tracing, the module sets its text to read-only, outside the > scope of ftrace setting all text to read-write. When ftrace gets to the > module code, it fails to do the modifications and you get the > ftrace_bug() splat. > > Sounds right? Right, indeed. My case is worse than that as my android uses kernel 3.14.50, which has another bug around remove_breakpoint as it calls probe_kernel_write directly. The latest upstream has no such issue. So we just need focus on the race you explain well. > > Now, I really dislike the taking of the ftrace mutex and returning back > to module handling. That is very subtle, fragile and prone to bugs. Yes, just like that the current codes are not perfect. :) > I > took a step back to find another solution and I think I found one. > > Take the below patch and add to it (I'll keep this patch as mine, and > you can submit another patch to do the following). You don't need to > resend this patch, just send me a patch that does this: Ok. > > > Add the module notifier that calls ftrace_module_enable(mod), removing > it from ftrace_init_module(). Feel free to monkey with that function to > be able to be called directly if it can't already. > > What my patch does is to create a new ftrace record flag DISABLED, > which is set to all functions in a module as it is added. Then later on > (in the module notifier that you will add), the flag is cleared. In > between, the module functions will be ignored. It's a good idea. > > If the module errors out and the notifier is not called, we don't care. > The records will simply be removed, and the flag will be meaningless. One question: if complete_formation fails, load_module wouldn't send any notification and ftrace_release_mod is not called. How can ftrace core remove all pg->records of the module? If complete_formation succeeds, further calls cause module errors out, load_module would send MODULE_STATE_GOING, ftrace_release_mod is called. Thanks for your quick response. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/15 9:05, Zhang, Yanmin wrote: > On 2015/12/14 23:51, Steven Rostedt wrote: >> On Mon, 14 Dec 2015 11:16:18 +0800 >> "Qiu, PeiyangX" wrote: >> >>> We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip. >>> Basically, there is a race between insmod and ftrace_run_update_code. >>> >>> After load_module=>ftrace_module_init, another thread jumps in to call >>> ftrace_run_update_code=>ftrace_arch_code_modify_prepare >>> =>set_all_modules_text_rw, to change all modules >>> as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute >>> is not changed. Then, the 2nd thread goes ahead to change codes. >>> However, load_module continues to call >>> complete_formation=>set_section_ro_nx, >>> then 2nd thread would fail when probing the module's TEXT. >>> >>> Below patch tries to resolve it. >>> >>> commit a949ae560a511fe4e3adf48fa44fefded93e5c2b >>> Author: Steven Rostedt (Red Hat) >>> Date: Thu Apr 24 10:40:12 2014 -0400 >>> >>> ftrace/module: Hardcode ftrace_module_init() call into load_module() >>> >>> But it can't fully resolve the issue. >>> >>> THis patch holds ftrace_mutex across ftrace_module_init and >>> complete_formation. >>> >>> Signed-off-by: Qiu Peiyang >>> Signed-off-by: Zhang Yanmin >> First, this patch has major whitespace damage. All tabs are now spaces! > This seems very hackish, although I can't think of a better way at the > moment. But I would like not to add more code into module.c if > possible, and just use a notifier unless there's a real reason we can't > (as there was when we added the hook in the first place). > We would double-check it again. It's not a good idea to change common > module codes. I double-checked it. If using notifier, we have to add 2 new module notification events in enum module_state. For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED. At MODULE_STATE_INITING, call ftrace_module_init(mod); At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar function); At MODULE_STATE_COMING, call a new function which links the new start_pg to ftrace_pages. Such design can also fix another bug in current kernel that ftrace start_pg of the module is not cleaned up if complete_formation fails. However, we change module common codes. The new events only serve ftrace. If not holding ftrace_mutex across start_pg setup and complete_formation, some other variables are not protected well, such like ftrace_pages, ftrace_start_up, and so on. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/14 23:51, Steven Rostedt wrote: > On Mon, 14 Dec 2015 11:16:18 +0800 > "Qiu, PeiyangX" wrote: > >> We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip. >> Basically, there is a race between insmod and ftrace_run_update_code. >> >> After load_module=>ftrace_module_init, another thread jumps in to call >> ftrace_run_update_code=>ftrace_arch_code_modify_prepare >> =>set_all_modules_text_rw, to change all modules >> as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute >> is not changed. Then, the 2nd thread goes ahead to change codes. >> However, load_module continues to call >> complete_formation=>set_section_ro_nx, >> then 2nd thread would fail when probing the module's TEXT. >> >> Below patch tries to resolve it. >> >> commit a949ae560a511fe4e3adf48fa44fefded93e5c2b >> Author: Steven Rostedt (Red Hat) >> Date: Thu Apr 24 10:40:12 2014 -0400 >> >> ftrace/module: Hardcode ftrace_module_init() call into load_module() >> >> But it can't fully resolve the issue. >> >> THis patch holds ftrace_mutex across ftrace_module_init and >> complete_formation. >> >> Signed-off-by: Qiu Peiyang >> Signed-off-by: Zhang Yanmin > First, this patch has major whitespace damage. All tabs are now spaces! Sorry. That's really bad. It's the 1st time for Peiyang to send patch to LKML. He would fix such issue by reconfiguring email client. > >> --- >> include/linux/ftrace.h | 6 -- >> kernel/module.c| 3 ++- >> kernel/trace/ftrace.c | 33 ++--- >> 3 files changed, 28 insertions(+), 14 deletions(-) >> >> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h >> index eae6548..4adc279 100644 >> --- a/include/linux/ftrace.h >> +++ b/include/linux/ftrace.h >> @@ -585,7 +585,8 @@ static inline int ftrace_modify_call(struct >> dyn_ftrace *rec, unsigned long old_a >> extern int ftrace_arch_read_dyn_info(char *buf, int size); >> >> extern int skip_trace(unsigned long ip); >> -extern void ftrace_module_init(struct module *mod); >> +extern void ftrace_module_init_start(struct module *mod); >> +extern void ftrace_module_init_end(struct module *mod); >> >> extern void ftrace_disable_daemon(void); >> extern void ftrace_enable_daemon(void); >> @@ -595,7 +596,8 @@ static inline int ftrace_force_update(void) { return >> 0; } >> static inline void ftrace_disable_daemon(void) { } >> static inline void ftrace_enable_daemon(void) { } >> static inline void ftrace_release_mod(struct module *mod) {} >> -static inline void ftrace_module_init(struct module *mod) {} >> +static inline void ftrace_module_init_start(struct module *mod) {} >> +static inline void ftrace_module_init_end(struct module *mod) {} >> static inline __init int register_ftrace_command(struct >> ftrace_func_command *cmd) >> { >> return -EINVAL; >> diff --git a/kernel/module.c b/kernel/module.c >> index 8f051a1..324c5c6 100644 >> --- a/kernel/module.c >> +++ b/kernel/module.c >> @@ -3506,10 +3506,11 @@ static int load_module(struct load_info *info, >> const char __user *uargs, >> dynamic_debug_setup(info->debug, info->num_debug); >> >> /* Ftrace init must be called in the MODULE_STATE_UNFORMED state */ >> -ftrace_module_init(mod); >> +ftrace_module_init_start(mod); >> >> /* Finally it's fully formed, ready to start executing. */ >> err = complete_formation(mod, info); >> +ftrace_module_init_end(mod); > Instead of modifying the module code even more, why not just put this > into the module notifier when done? > > Then this would only need to change the ftrace code. Although, holding > a lock throughout seems rather hacky. We did consider notifier. When ftrace_module_init is called, start_pg of the module is linked to ftrace_pages. To fix the issue, we have to either add a new var to struct module to save its start_pg, or change ftrace to save module start_pg before MODULE_STATE_COMING notification is delivered. In addition, function complete_formation need consider cleanup the start_pg if init fails. > > > >> if (err) >> goto ddebug_cleanup; >> >> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c >> index 3f743b1..436c199 100644 >> --- a/kernel/trace/ftrace.c >> +++ b/kernel/trace/ftrace.c >> @@ -4795,7 +4795,7 @@ static int ftrace_cmp_ips(const void *a, const >> void *b) >&
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/14 23:51, Steven Rostedt wrote: > On Mon, 14 Dec 2015 11:16:18 +0800 > "Qiu, PeiyangX" <peiyangx@intel.com> wrote: > >> We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip. >> Basically, there is a race between insmod and ftrace_run_update_code. >> >> After load_module=>ftrace_module_init, another thread jumps in to call >> ftrace_run_update_code=>ftrace_arch_code_modify_prepare >> =>set_all_modules_text_rw, to change all modules >> as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute >> is not changed. Then, the 2nd thread goes ahead to change codes. >> However, load_module continues to call >> complete_formation=>set_section_ro_nx, >> then 2nd thread would fail when probing the module's TEXT. >> >> Below patch tries to resolve it. >> >> commit a949ae560a511fe4e3adf48fa44fefded93e5c2b >> Author: Steven Rostedt (Red Hat) <rost...@goodmis.org> >> Date: Thu Apr 24 10:40:12 2014 -0400 >> >> ftrace/module: Hardcode ftrace_module_init() call into load_module() >> >> But it can't fully resolve the issue. >> >> THis patch holds ftrace_mutex across ftrace_module_init and >> complete_formation. >> >> Signed-off-by: Qiu Peiyang <peiyangx@intel.com> >> Signed-off-by: Zhang Yanmin <yanmin.zh...@intel.com> > First, this patch has major whitespace damage. All tabs are now spaces! Sorry. That's really bad. It's the 1st time for Peiyang to send patch to LKML. He would fix such issue by reconfiguring email client. > >> --- >> include/linux/ftrace.h | 6 -- >> kernel/module.c| 3 ++- >> kernel/trace/ftrace.c | 33 ++--- >> 3 files changed, 28 insertions(+), 14 deletions(-) >> >> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h >> index eae6548..4adc279 100644 >> --- a/include/linux/ftrace.h >> +++ b/include/linux/ftrace.h >> @@ -585,7 +585,8 @@ static inline int ftrace_modify_call(struct >> dyn_ftrace *rec, unsigned long old_a >> extern int ftrace_arch_read_dyn_info(char *buf, int size); >> >> extern int skip_trace(unsigned long ip); >> -extern void ftrace_module_init(struct module *mod); >> +extern void ftrace_module_init_start(struct module *mod); >> +extern void ftrace_module_init_end(struct module *mod); >> >> extern void ftrace_disable_daemon(void); >> extern void ftrace_enable_daemon(void); >> @@ -595,7 +596,8 @@ static inline int ftrace_force_update(void) { return >> 0; } >> static inline void ftrace_disable_daemon(void) { } >> static inline void ftrace_enable_daemon(void) { } >> static inline void ftrace_release_mod(struct module *mod) {} >> -static inline void ftrace_module_init(struct module *mod) {} >> +static inline void ftrace_module_init_start(struct module *mod) {} >> +static inline void ftrace_module_init_end(struct module *mod) {} >> static inline __init int register_ftrace_command(struct >> ftrace_func_command *cmd) >> { >> return -EINVAL; >> diff --git a/kernel/module.c b/kernel/module.c >> index 8f051a1..324c5c6 100644 >> --- a/kernel/module.c >> +++ b/kernel/module.c >> @@ -3506,10 +3506,11 @@ static int load_module(struct load_info *info, >> const char __user *uargs, >> dynamic_debug_setup(info->debug, info->num_debug); >> >> /* Ftrace init must be called in the MODULE_STATE_UNFORMED state */ >> -ftrace_module_init(mod); >> +ftrace_module_init_start(mod); >> >> /* Finally it's fully formed, ready to start executing. */ >> err = complete_formation(mod, info); >> +ftrace_module_init_end(mod); > Instead of modifying the module code even more, why not just put this > into the module notifier when done? > > Then this would only need to change the ftrace code. Although, holding > a lock throughout seems rather hacky. We did consider notifier. When ftrace_module_init is called, start_pg of the module is linked to ftrace_pages. To fix the issue, we have to either add a new var to struct module to save its start_pg, or change ftrace to save module start_pg before MODULE_STATE_COMING notification is delivered. In addition, function complete_formation need consider cleanup the start_pg if init fails. > > > >> if (err) >> goto ddebug_cleanup; >> >> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c >> index 3f743b1..436c199 100644 >> --- a/kernel/trace/ftrace.c >> +++ b/kernel/trace/ftrace.c &
Re: [PATCH] ftrace: fix race between ftrace and insmod
On 2015/12/15 9:05, Zhang, Yanmin wrote: > On 2015/12/14 23:51, Steven Rostedt wrote: >> On Mon, 14 Dec 2015 11:16:18 +0800 >> "Qiu, PeiyangX" <peiyangx@intel.com> wrote: >> >>> We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip. >>> Basically, there is a race between insmod and ftrace_run_update_code. >>> >>> After load_module=>ftrace_module_init, another thread jumps in to call >>> ftrace_run_update_code=>ftrace_arch_code_modify_prepare >>> =>set_all_modules_text_rw, to change all modules >>> as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute >>> is not changed. Then, the 2nd thread goes ahead to change codes. >>> However, load_module continues to call >>> complete_formation=>set_section_ro_nx, >>> then 2nd thread would fail when probing the module's TEXT. >>> >>> Below patch tries to resolve it. >>> >>> commit a949ae560a511fe4e3adf48fa44fefded93e5c2b >>> Author: Steven Rostedt (Red Hat) <rost...@goodmis.org> >>> Date: Thu Apr 24 10:40:12 2014 -0400 >>> >>> ftrace/module: Hardcode ftrace_module_init() call into load_module() >>> >>> But it can't fully resolve the issue. >>> >>> THis patch holds ftrace_mutex across ftrace_module_init and >>> complete_formation. >>> >>> Signed-off-by: Qiu Peiyang <peiyangx@intel.com> >>> Signed-off-by: Zhang Yanmin <yanmin.zh...@intel.com> >> First, this patch has major whitespace damage. All tabs are now spaces! > This seems very hackish, although I can't think of a better way at the > moment. But I would like not to add more code into module.c if > possible, and just use a notifier unless there's a real reason we can't > (as there was when we added the hook in the first place). > We would double-check it again. It's not a good idea to change common > module codes. I double-checked it. If using notifier, we have to add 2 new module notification events in enum module_state. For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED. At MODULE_STATE_INITING, call ftrace_module_init(mod); At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar function); At MODULE_STATE_COMING, call a new function which links the new start_pg to ftrace_pages. Such design can also fix another bug in current kernel that ftrace start_pg of the module is not cleaned up if complete_formation fails. However, we change module common codes. The new events only serve ftrace. If not holding ftrace_mutex across start_pg setup and complete_formation, some other variables are not protected well, such like ftrace_pages, ftrace_start_up, and so on. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] smpboot.c: move setup_vector_irq after set_cpu_online
On 2015/7/2 15:02, Xiao, Jin wrote: > Hi Joerg, > > On 7/2/2015 2:52 PM, Joerg Roedel wrote: >> Hi Jin, >> >> On Thu, Jul 02, 2015 at 12:24:34PM +0800, xiao jin wrote: >>> [ 106.107851] BUG: unable to handle kernel NULL pointer dereference at >>> 0040 >>> [ 106.116702] IP: >>> [ 106.118490] [] >>> check_irq_vectors_for_cpu_disable+0x76/0x180 >> This might be the same issue I fixed with: >> >> d97eb8966c91 x86/irq: Check for valid irq descriptor in >> check_irq_vectors_for_cpu_disable() >> >> Can you try whether applying this patch on your kernel fixes your >> issue? >> > Yes, I agree d97eb8966c91 fix the IPANIC. I just notice a description from > http://lkml.kernel.org/r/20150204132754.ga10...@suse.de. > > I wish to share the debug result and root cause from my side. commit d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1 looks like a workaround. Could Jin's patch be merged also? free_msi_irqs does have a race with smp_callin=>..=>__setup_vector_irq. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] smpboot.c: move setup_vector_irq after set_cpu_online
On 2015/7/2 15:02, Xiao, Jin wrote: Hi Joerg, On 7/2/2015 2:52 PM, Joerg Roedel wrote: Hi Jin, On Thu, Jul 02, 2015 at 12:24:34PM +0800, xiao jin wrote: [ 106.107851] BUG: unable to handle kernel NULL pointer dereference at 0040 [ 106.116702] IP: [ 106.118490] [810044f6] check_irq_vectors_for_cpu_disable+0x76/0x180 This might be the same issue I fixed with: d97eb8966c91 x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable() Can you try whether applying this patch on your kernel fixes your issue? Yes, I agree d97eb8966c91 fix the IPANIC. I just notice a description from http://lkml.kernel.org/r/20150204132754.ga10...@suse.de. I wish to share the debug result and root cause from my side. commit d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1 looks like a workaround. Could Jin's patch be merged also? free_msi_irqs does have a race with smp_callin=..=__setup_vector_irq. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slub/slab: fix kmemleak didn't work on some case
On 2015/6/9 23:03, Catalin Marinas wrote: > On Tue, Jun 09, 2015 at 09:10:45AM +0100, Zhang, Yanmin wrote: >> On 2015/6/8 18:13, Catalin Marinas wrote: >>> As I replied already, I don't think this is that bad, or at least not >>> worse than what kmemleak already does (looking at all data whether it's >>> pointer or not). >> It depends. As for memleak, developers prefers there are false alarms instead >> of missing some leaked memory. > Lots of false positives aren't that nice, you spend a lot of time > debugging them (I've been there in the early kmemleak days). Anyway, > your use case is not about false positives vs. negatives but just false > negatives. > > My point is that there is a lot of random, pointer-like data read by > kmemleak even without this memset (e.g. thread stacks, non-pointer data > in kmalloc'ed structures, data/bss sections). Just doing this memset may > reduce the chance of false negatives a bit but I don't think it would be > noticeable. > > If there is some serious memory leak (lots of objects), they would > likely show up at some point. Even if it's a one-off leak, it's possible > that it shows up after some time (e.g. the object pointing to this > memory block is freed). > >>> It also doesn't solve the kmem_cache_alloc() case where >>> the original object size is no longer available. >> Such issue around kmem_cache_alloc() case happens only when the >> caller doesn't initialize or use the full object, so the object keeps >> old dirty data. > The kmem_cache blocks size would be aligned to a cache line, so you > still have some extra bytes never touched by the caller. > >> This patch is to resolve the redundant unused space (more than object size) >> although the full object is used by kernel. > So this solves only the cases where the original object size is still > known (e.g. kmalloc). It could also be solved by telling kmemleak the > actual object size. Your explanation is reasonable. The patch is for debug purpose. Maintainers can make decision based on balance. Xinwu is a new developer in kernel community. Accepting the patch into kernel can encourage him definitely. :) Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slub/slab: fix kmemleak didn't work on some case
On 2015/6/9 23:03, Catalin Marinas wrote: On Tue, Jun 09, 2015 at 09:10:45AM +0100, Zhang, Yanmin wrote: On 2015/6/8 18:13, Catalin Marinas wrote: As I replied already, I don't think this is that bad, or at least not worse than what kmemleak already does (looking at all data whether it's pointer or not). It depends. As for memleak, developers prefers there are false alarms instead of missing some leaked memory. Lots of false positives aren't that nice, you spend a lot of time debugging them (I've been there in the early kmemleak days). Anyway, your use case is not about false positives vs. negatives but just false negatives. My point is that there is a lot of random, pointer-like data read by kmemleak even without this memset (e.g. thread stacks, non-pointer data in kmalloc'ed structures, data/bss sections). Just doing this memset may reduce the chance of false negatives a bit but I don't think it would be noticeable. If there is some serious memory leak (lots of objects), they would likely show up at some point. Even if it's a one-off leak, it's possible that it shows up after some time (e.g. the object pointing to this memory block is freed). It also doesn't solve the kmem_cache_alloc() case where the original object size is no longer available. Such issue around kmem_cache_alloc() case happens only when the caller doesn't initialize or use the full object, so the object keeps old dirty data. The kmem_cache blocks size would be aligned to a cache line, so you still have some extra bytes never touched by the caller. This patch is to resolve the redundant unused space (more than object size) although the full object is used by kernel. So this solves only the cases where the original object size is still known (e.g. kmalloc). It could also be solved by telling kmemleak the actual object size. Your explanation is reasonable. The patch is for debug purpose. Maintainers can make decision based on balance. Xinwu is a new developer in kernel community. Accepting the patch into kernel can encourage him definitely. :) Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slub/slab: fix kmemleak didn't work on some case
On 2015/6/8 18:13, Catalin Marinas wrote: > On Mon, Jun 08, 2015 at 10:38:13AM +0100, Christoph Lameter wrote: >> On Mon, 8 Jun 2015, Liu, XinwuX wrote: >> >>> when kernel uses kmalloc to allocate memory, slub/slab will find >>> a suitable kmem_cache. Ususally the cache's object size is often >>> greater than requested size. There is unused space which contains >>> dirty data. These dirty data might have pointers pointing to a block >> dirty? In what sense? > I guess XinwuX meant uninitialised. Uninitialized or dirty data used before being freed. > >>> of leaked memory. Kernel wouldn't consider this memory as leaked when >>> scanning kmemleak object. >> This has never been considered leaked memory before to my knowledge and >> the data is already initialized. > It's not the object being allocated that is considered leaked. But > uninitialised data in this object is scanned by kmemleak and it may look > like valid pointers to real leaked objects. So such data increases the > number of kmemleak false negatives. Yes, indeed. > > As I replied already, I don't think this is that bad, or at least not > worse than what kmemleak already does (looking at all data whether it's > pointer or not). It depends. As for memleak, developers prefers there are false alarms instead of missing some leaked memory. > It also doesn't solve the kmem_cache_alloc() case where > the original object size is no longer available. Such issue around kmem_cache_alloc() case happens only when the caller doesn't initialize or use the full object, so the object keeps old dirty data. This patch is to resolve the redundant unused space (more than object size) although the full object is used by kernel. > >> F.e. The zeroing function in linux/mm/slub.c::slab_alloc_node() zeros the >> complete object and not only the number of bytes specified in the kmalloc >> call. Same thing is true for SLAB. > But that's only when __GFP_ZERO is passed. > Thanks for the kind comments. There is a balance between performance (new memset consumes time) and debug capability. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] slub/slab: fix kmemleak didn't work on some case
On 2015/6/8 18:13, Catalin Marinas wrote: On Mon, Jun 08, 2015 at 10:38:13AM +0100, Christoph Lameter wrote: On Mon, 8 Jun 2015, Liu, XinwuX wrote: when kernel uses kmalloc to allocate memory, slub/slab will find a suitable kmem_cache. Ususally the cache's object size is often greater than requested size. There is unused space which contains dirty data. These dirty data might have pointers pointing to a block dirty? In what sense? I guess XinwuX meant uninitialised. Uninitialized or dirty data used before being freed. of leaked memory. Kernel wouldn't consider this memory as leaked when scanning kmemleak object. This has never been considered leaked memory before to my knowledge and the data is already initialized. It's not the object being allocated that is considered leaked. But uninitialised data in this object is scanned by kmemleak and it may look like valid pointers to real leaked objects. So such data increases the number of kmemleak false negatives. Yes, indeed. As I replied already, I don't think this is that bad, or at least not worse than what kmemleak already does (looking at all data whether it's pointer or not). It depends. As for memleak, developers prefers there are false alarms instead of missing some leaked memory. It also doesn't solve the kmem_cache_alloc() case where the original object size is no longer available. Such issue around kmem_cache_alloc() case happens only when the caller doesn't initialize or use the full object, so the object keeps old dirty data. This patch is to resolve the redundant unused space (more than object size) although the full object is used by kernel. F.e. The zeroing function in linux/mm/slub.c::slab_alloc_node() zeros the complete object and not only the number of bytes specified in the kmalloc call. Same thing is true for SLAB. But that's only when __GFP_ZERO is passed. Thanks for the kind comments. There is a balance between performance (new memset consumes time) and debug capability. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write
On 2015/5/27 12:13, Zhang, Yanmin wrote: > Resend as V1/V2 have email format issue. Sorry for bothering. Greg, We have to abandon this patchset. Zhuang Jin Can, a USB expert, reviewed the patches. He says acm_tty_write already considers it carefully. acm_tty_write puts the urb to a delayed queue when acm->susp_count is not 0. acm_suspend adds 1 to acm->susp_count and acm_resume decreases 1 from it . Sorry for the bothering. Also thank Jin Can for the comments. Yanmin > > I use Thunderbird. It has no a button to enable LKML email simply. :) > > V3: Change email config to resend. > Add a space in comment. > > --- > > There is a scenario about cdc-acm utilization.Application opens > n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device. > The application configures cdc-adm tty to n_gsm tty as ldisc tty. > > n_gsm=>cdc-acm=>xhci driver > > acm_tty_write can be called from n_gsm driver by ldisc connection, > and from application when application opens cdc-acm tty directly. > acm_tty_write wakes up the device by calling usb_autopm_get_interface_async, > which calls pm_runtime_get. However, pm_runtime_get can't wake up > the device before returning as it's an async wake up. Then, acm_tty_write > might access the device when it is off. > > The patchset fixes it by: > 1) add a new function usb_autopm_get_interface_upgrade to deal with > above 2 requirements; > 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver; > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 1/3] usb: add function usb_autopm_get_interface_upgrade
From: Zhang Yanmin Some usb driver has a specific requirement. Their critical functions might be called under both atomic environment and non-atomic environment. If it's under atomic environment, the driver can wake up the device by calling pm_runtime_get_sync directly. If it's under non-atomic environment, the function's caller need wake up the device before the function accesses the device. The patch adds usb_autopm_get_interface_upgrade, a new function to support above capability. Signed-off-by: Zhang Yanmin --- drivers/usb/core/driver.c | 54 +++ include/linux/usb.h | 3 +++ 2 files changed, 57 insertions(+) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 818369a..5d6f9ee 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1684,6 +1684,60 @@ int usb_autopm_get_interface(struct usb_interface *intf) EXPORT_SYMBOL_GPL(usb_autopm_get_interface); /** + * usb_autopm_get_interface_upgrade - increment a USB interface's PM-usage + * counter + * @intf: the usb_interface whose counter should be incremented + * + * This routine should be called by an interface driver when it wants to + * use @intf and needs to guarantee that it is not suspended. In addition, + * the routine prevents @intf from being autosuspended subsequently. (Note + * that this will not prevent suspend events originating in the PM core.) + * This prevention will persist until usb_autopm_put_interface() is called + * or @intf is unbound. A typical example would be a character-device + * driver when its device file is opened. + * + * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade + * is more careful when resuming the device. + * 1) The caller's caller already resumes resume the device and hold spinlocks. + * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync; + * 2) The caller's caller doesn't resume the device. + * usb_autopm_get_interface_upgrade has to resume the device before going ahead. + * + * @intf's usage counter is incremented to prevent subsequent autosuspends. + * However if the autoresume fails then the counter is re-decremented. + * + * This routine can run only in process context. + * + * Return: 0 on success. + */ +int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ + int status = 0; + + pm_runtime_get(>dev); + if (!pm_runtime_active(>dev)) { + /* If not active, next _get_sync wakes device up*/ + status = pm_runtime_get_sync(>dev); + /* +* If it's active, next _put_sync wouldn't +* really put it to sleep as the 1st _get +* keeps the device active. +*/ + pm_runtime_put_sync(>dev); + if (status < 0) + pm_runtime_put(>dev); + } else + atomic_inc(>pm_usage_cnt); + dev_vdbg(>dev, "%s: cnt %d -> %d\n", + __func__, atomic_read(>dev.power.usage_count), + status); + if (status > 0) + status = 0; + return status; +} +EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade); + +/** * usb_autopm_get_interface_async - increment a USB interface's PM-usage counter * @intf: the usb_interface whose counter should be incremented * diff --git a/include/linux/usb.h b/include/linux/usb.h index 447fe29..0a8a44a 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev); extern void usb_disable_autosuspend(struct usb_device *udev); extern int usb_autopm_get_interface(struct usb_interface *intf); +extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf); extern void usb_autopm_put_interface(struct usb_interface *intf); extern int usb_autopm_get_interface_async(struct usb_interface *intf); extern void usb_autopm_put_interface_async(struct usb_interface *intf); @@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device *udev) static inline int usb_autopm_get_interface(struct usb_interface *intf) { return 0; } +static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ return 0; } static inline int usb_autopm_get_interface_async(struct usb_interface *intf) { return 0; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 3/3] n_gsm: wake up ldisc tty before using it
From: Zhang Yanmin Wake up ldisc device before calling its driver to access the device. Signed-off-by: Zhang Yanmin --- drivers/tty/n_gsm.c | 40 +++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 2c34c32..f887df6 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -62,6 +62,7 @@ #include #include #include +#include static int debug; module_param(debug, int, 0600); @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len) return olen; } +static int pm_runtime_get_sync_tty(struct tty_struct *tty) +{ + int ret = 0; + + /* Wakeup parent as tty itself doesn't enable runtime */ + if (tty->dev->parent) + ret = pm_runtime_get_sync(tty->dev->parent); + + return ret; +} + +static int pm_runtime_put_tty(struct tty_struct *tty) +{ + int ret = 0; + + if (tty->dev->parent) + ret = pm_runtime_put(tty->dev->parent); + + return ret; +} + /** * gsm_send- send a control frame * @gsm: our GSM mux @@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci) return; dlci->retries = gsm->n2; dlci->state = DLCI_OPENING; + pm_runtime_get_sync_tty(gsm->tty); gsm_command(dlci->gsm, dlci->addr, SABM|PF); + pm_runtime_put_tty(gsm->tty); mod_timer(>t1, jiffies + gsm->t1 * HZ / 100); } @@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci) return; dlci->retries = gsm->n2; dlci->state = DLCI_CLOSING; + pm_runtime_get_sync_tty(gsm->tty); gsm_command(dlci->gsm, dlci->addr, DISC|PF); + pm_runtime_put_tty(gsm->tty); mod_timer(>t1, jiffies + gsm->t1 * HZ / 100); } @@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, const unsigned char *cp, flags = *f++; switch (flags) { case TTY_NORMAL: + pm_runtime_get_sync_tty(gsm->tty); gsm->receive(gsm, *dp); + pm_runtime_put_tty(gsm->tty); break; case TTY_OVERRUN: case TTY_BREAK: @@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) { struct gsm_dlci *dlci = tty->driver_data; struct tty_port *port = >port; + int ret; port->count++; tty_port_tty_set(port, tty); @@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) /* Start sending off SABM messages */ gsm_dlci_begin_open(dlci); /* And wait for virtual carrier */ - return tty_port_block_til_ready(port, tty, filp); + pm_runtime_get_sync_tty(dlci->gsm->tty); + ret = tty_port_block_til_ready(port, tty, filp); + pm_runtime_put_tty(dlci->gsm->tty); + + return ret; } static void gsmtty_close(struct tty_struct *tty, struct file *filp) @@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct file *filp) gsm = dlci->gsm; if (tty_port_close_start(>port, tty, filp) == 0) return; + pm_runtime_get_sync_tty(gsm->tty); gsm_dlci_begin_close(dlci); if (test_bit(ASYNCB_INITIALIZED, >port.flags)) { if (C_HUPCL(tty)) tty_port_lower_dtr_rts(>port); } + pm_runtime_put_tty(gsm->tty); + tty_port_close_end(>port, tty); tty_port_tty_set(>port, NULL); return; @@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const unsigned char *buf, struct gsm_dlci *dlci = tty->driver_data; if (dlci->state == DLCI_CLOSED) return -EINVAL; + pm_runtime_get_sync_tty(dlci->gsm->tty); /* Stuff the bytes into the fifo queue */ sent = kfifo_in_locked(dlci->fifo, buf, len, >lock); /* Need to kick the channel */ gsm_dlci_data_kick(dlci); + pm_runtime_put_tty(dlci->gsm->tty); return sent; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in acm_tty_write
From: Zhang Yanmin acm device might be used as ldisc device by n_gsm driver. gsmtty_write and other gsm functions calls acm_tty_write indirectly while they holds spinlocks. Meanwhile, application might access ACM tty device directly. Here we choose to call usb_autopm_get_interface_upgrade instead of usb_autopm_get_interface_async to make sure above 2 scenarios can work well. Signed-off-by: Zhang Yanmin --- drivers/usb/class/cdc-acm.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 5c8f581..6ad85a3 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty, dev_vdbg(>data->dev, "%s - count %d\n", __func__, count); + stat = usb_autopm_get_interface_upgrade(acm->control); + if (stat) + return stat; + spin_lock_irqsave(>write_lock, flags); wbn = acm_wb_alloc(acm); if (wbn < 0) { spin_unlock_irqrestore(>write_lock, flags); + usb_autopm_put_interface(acm->control); return 0; } wb = >wb[wbn]; @@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty, if (!acm->dev) { wb->use = 0; spin_unlock_irqrestore(>write_lock, flags); + usb_autopm_put_interface(acm->control); return -ENODEV; } @@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty, memcpy(wb->buf, buf, count); wb->len = count; - stat = usb_autopm_get_interface_async(acm->control); - if (stat) { - wb->use = 0; - spin_unlock_irqrestore(>write_lock, flags); - return stat; - } - if (acm->susp_count) { usb_anchor_urb(wb->urb, >delayed); spin_unlock_irqrestore(>write_lock, flags); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write
Resend as V1/V2 have email format issue. Sorry for bothering. I use Thunderbird. It has no a button to enable LKML email simply. :) V3: Change email config to resend. Add a space in comment. --- There is a scenario about cdc-acm utilization.Application opens n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device. The application configures cdc-adm tty to n_gsm tty as ldisc tty. n_gsm=>cdc-acm=>xhci driver acm_tty_write can be called from n_gsm driver by ldisc connection, and from application when application opens cdc-acm tty directly. acm_tty_write wakes up the device by calling usb_autopm_get_interface_async, which calls pm_runtime_get. However, pm_runtime_get can't wake up the device before returning as it's an async wake up. Then, acm_tty_write might access the device when it is off. The patchset fixes it by: 1) add a new function usb_autopm_get_interface_upgrade to deal with above 2 requirements; 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 3/3] n_gsm: wake up ldisc tty before using it
On 2015/5/27 11:02, Greg KH wrote: > On Wed, May 27, 2015 at 10:50:01AM +0800, Zhang, Yanmin wrote: >> Wake up ldisc device before calling its driver to access the device. >> >> Signed-off-by: Zhang Yanmin >> >> --- >> >> drivers/tty/n_gsm.c | 40 +++- >> 1 file changed, 39 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c >> index 2c34c32..40671fa 100644 >> --- a/drivers/tty/n_gsm.c >> +++ b/drivers/tty/n_gsm.c >> @@ -62,6 +62,7 @@ >> #include >> #include >> #include >> +#include >> >> static int debug; >> module_param(debug, int, 0600); >> @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, >> int len) >> return olen; >> } >> >> +static int pm_runtime_get_sync_tty(struct tty_struct *tty) >> +{ >> +int ret = 0; >> + >> +/*Wakeup parent as tty itself doesn't enable runtime*/ > No spaces in your comment? I will add it. > > Anyway, this is corrupted and can't be applied, please fix up your email > client and try it again... I check the patch by scripts/checkpatch.pl and everything seems good. I use Thunderbird 31.7.0, the latest. It auto updates to the latest version. Perhaps some new config options changed something. It seems email client converts some tab to space automatically. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 3/3] n_gsm: wake up ldisc tty before using it
Wake up ldisc device before calling its driver to access the device. Signed-off-by: Zhang Yanmin --- drivers/tty/n_gsm.c | 40 +++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 2c34c32..40671fa 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -62,6 +62,7 @@ #include #include #include +#include static int debug; module_param(debug, int, 0600); @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len) return olen; } +static int pm_runtime_get_sync_tty(struct tty_struct *tty) +{ +int ret = 0; + +/*Wakeup parent as tty itself doesn't enable runtime*/ +if (tty->dev->parent) +ret = pm_runtime_get_sync(tty->dev->parent); + +return ret; +} + +static int pm_runtime_put_tty(struct tty_struct *tty) +{ +int ret = 0; + +if (tty->dev->parent) +ret = pm_runtime_put(tty->dev->parent); + +return ret; +} + /** *gsm_send-send a control frame *@gsm: our GSM mux @@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci) return; dlci->retries = gsm->n2; dlci->state = DLCI_OPENING; +pm_runtime_get_sync_tty(gsm->tty); gsm_command(dlci->gsm, dlci->addr, SABM|PF); +pm_runtime_put_tty(gsm->tty); mod_timer(>t1, jiffies + gsm->t1 * HZ / 100); } @@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci) return; dlci->retries = gsm->n2; dlci->state = DLCI_CLOSING; +pm_runtime_get_sync_tty(gsm->tty); gsm_command(dlci->gsm, dlci->addr, DISC|PF); +pm_runtime_put_tty(gsm->tty); mod_timer(>t1, jiffies + gsm->t1 * HZ / 100); } @@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, const unsigned char *cp, flags = *f++; switch (flags) { case TTY_NORMAL: +pm_runtime_get_sync_tty(gsm->tty); gsm->receive(gsm, *dp); +pm_runtime_put_tty(gsm->tty); break; case TTY_OVERRUN: case TTY_BREAK: @@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) { struct gsm_dlci *dlci = tty->driver_data; struct tty_port *port = >port; +int ret; port->count++; tty_port_tty_set(port, tty); @@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) /* Start sending off SABM messages */ gsm_dlci_begin_open(dlci); /* And wait for virtual carrier */ -return tty_port_block_til_ready(port, tty, filp); +pm_runtime_get_sync_tty(dlci->gsm->tty); +ret = tty_port_block_til_ready(port, tty, filp); +pm_runtime_put_tty(dlci->gsm->tty); + +return ret; } static void gsmtty_close(struct tty_struct *tty, struct file *filp) @@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct file *filp) gsm = dlci->gsm; if (tty_port_close_start(>port, tty, filp) == 0) return; +pm_runtime_get_sync_tty(gsm->tty); gsm_dlci_begin_close(dlci); if (test_bit(ASYNCB_INITIALIZED, >port.flags)) { if (C_HUPCL(tty)) tty_port_lower_dtr_rts(>port); } +pm_runtime_put_tty(gsm->tty); + tty_port_close_end(>port, tty); tty_port_tty_set(>port, NULL); return; @@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const unsigned char *buf, struct gsm_dlci *dlci = tty->driver_data; if (dlci->state == DLCI_CLOSED) return -EINVAL; +pm_runtime_get_sync_tty(dlci->gsm->tty); /* Stuff the bytes into the fifo queue */ sent = kfifo_in_locked(dlci->fifo, buf, len, >lock); /* Need to kick the channel */ gsm_dlci_data_kick(dlci); +pm_runtime_put_tty(dlci->gsm->tty); return sent; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in acm_tty_write
acm device might be used as ldisc device by n_gsm driver. gsmtty_write and other gsm functions calls acm_tty_write indirectly while they holds spinlocks. Meanwhile, application might access ACM tty device directly. Here we choose to call usb_autopm_get_interface_upgrade instead of usb_autopm_get_interface_async to make sure above 2 scenarios can work well. Signed-off-by: Zhang Yanmin --- drivers/usb/class/cdc-acm.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 5c8f581..6ad85a3 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty, dev_vdbg(>data->dev, "%s - count %d\n", __func__, count); +stat = usb_autopm_get_interface_upgrade(acm->control); +if (stat) +return stat; + spin_lock_irqsave(>write_lock, flags); wbn = acm_wb_alloc(acm); if (wbn < 0) { spin_unlock_irqrestore(>write_lock, flags); +usb_autopm_put_interface(acm->control); return 0; } wb = >wb[wbn]; @@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty, if (!acm->dev) { wb->use = 0; spin_unlock_irqrestore(>write_lock, flags); +usb_autopm_put_interface(acm->control); return -ENODEV; } @@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty, memcpy(wb->buf, buf, count); wb->len = count; -stat = usb_autopm_get_interface_async(acm->control); -if (stat) { -wb->use = 0; -spin_unlock_irqrestore(>write_lock, flags); -return stat; -} - if (acm->susp_count) { usb_anchor_urb(wb->urb, >delayed); spin_unlock_irqrestore(>write_lock, flags); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 1/3] usb: add function usb_autopm_get_interface_upgrade
Some usb driver has a specific requirement. Their critical functions might be called under both atomic environment and non-atomic environment. If it's under atomic environment, the driver can wake up the device by calling pm_runtime_get_sync directly. If it's under non-atomic environment, the function's caller need wake up the device before the function accesses the device. The patch adds usb_autopm_get_interface_upgrade, a new function to support above capability. Signed-off-by: Zhang Yanmin --- drivers/usb/core/driver.c | 54 +++ include/linux/usb.h | 3 +++ 2 files changed, 57 insertions(+) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 818369a..5d6f9ee 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1684,6 +1684,60 @@ int usb_autopm_get_interface(struct usb_interface *intf) EXPORT_SYMBOL_GPL(usb_autopm_get_interface); /** + * usb_autopm_get_interface_upgrade - increment a USB interface's PM-usage + * counter + * @intf: the usb_interface whose counter should be incremented + * + * This routine should be called by an interface driver when it wants to + * use @intf and needs to guarantee that it is not suspended. In addition, + * the routine prevents @intf from being autosuspended subsequently. (Note + * that this will not prevent suspend events originating in the PM core.) + * This prevention will persist until usb_autopm_put_interface() is called + * or @intf is unbound. A typical example would be a character-device + * driver when its device file is opened. + * + * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade + * is more careful when resuming the device. + * 1) The caller's caller already resumes resume the device and hold spinlocks. + * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync; + * 2) The caller's caller doesn't resume the device. + * usb_autopm_get_interface_upgrade has to resume the device before going ahead. + * + * @intf's usage counter is incremented to prevent subsequent autosuspends. + * However if the autoresume fails then the counter is re-decremented. + * + * This routine can run only in process context. + * + * Return: 0 on success. + */ +int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ +intstatus = 0; + +pm_runtime_get(>dev); +if (!pm_runtime_active(>dev)) { +/* If not active, next _get_sync wakes device up*/ +status = pm_runtime_get_sync(>dev); +/* + * If it's active, next _put_sync wouldn't + * really put it to sleep as the 1st _get + * keeps the device active. + */ +pm_runtime_put_sync(>dev); +if (status < 0) +pm_runtime_put(>dev); +} else +atomic_inc(>pm_usage_cnt); +dev_vdbg(>dev, "%s: cnt %d -> %d\n", +__func__, atomic_read(>dev.power.usage_count), +status); +if (status > 0) +status = 0; +return status; +} +EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade); + +/** * usb_autopm_get_interface_async - increment a USB interface's PM-usage counter * @intf: the usb_interface whose counter should be incremented * diff --git a/include/linux/usb.h b/include/linux/usb.h index 447fe29..0a8a44a 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev); extern void usb_disable_autosuspend(struct usb_device *udev); extern int usb_autopm_get_interface(struct usb_interface *intf); +extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf); extern void usb_autopm_put_interface(struct usb_interface *intf); extern int usb_autopm_get_interface_async(struct usb_interface *intf); extern void usb_autopm_put_interface_async(struct usb_interface *intf); @@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device *udev) static inline int usb_autopm_get_interface(struct usb_interface *intf) { return 0; } +static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ return 0; } static inline int usb_autopm_get_interface_async(struct usb_interface *intf) { return 0; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write
Resend as V1 has email format issue. Sorry for bothering. --- There is a scenario about cdc-acm utilization.Application opens n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device. The application configures cdc-adm tty to n_gsm tty as ldisc tty. n_gsm=>cdc-acm=>xhci driver acm_tty_write can be called from n_gsm driver by ldisc connection, and from application when application opens cdc-acm tty directly. acm_tty_write wakes up the device by calling usb_autopm_get_interface_async, which calls pm_runtime_get. However, pm_runtime_get can't wake up the device before returning as it's an async wake up. Then, acm_tty_write might access the device when it is off. The patchset fixes it by: 1) add a new function usb_autopm_get_interface_upgrade to deal with above 2 requirements; 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] usb: add function usb_autopm_get_interface_upgrade
On 2015/5/26 22:12, Greg Kroah-Hartman wrote: On Tue, May 26, 2015 at 10:19:57AM +0800, Zhang, Yanmin wrote: Some usb driver has a specific requirement. Their critical functions might be called under both atomic environment and non-atomic environment. If it's under atomic environment, the driver can wake up the device by calling pm_runtime_get_sync directly. If it's under non-atomic environment, the function's caller need wake up the device before the function accesses the device. The patch adds usb_autopm_get_interface_upgrade, a new function to support above capability. Signed-off-by: Zhang Yanmin --- drivers/usb/core/driver.c | 53 +++ include/linux/usb.h | 3 +++ 2 files changed, 56 insertions(+) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 818369a..d07fd8d 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1684,6 +1684,59 @@ int usb_autopm_get_interface(struct usb_interface *intf) EXPORT_SYMBOL_GPL(usb_autopm_get_interface); /** + * usb_autopm_get_interface_retry - increment a USB interface's PM-usage counter + * @intf: the usb_interface whose counter should be incremented Your emails are obviously corrupted and can't be applied.. Please fix up and resend all of them. Sorry. I didn't send patches for a long time. I would resend the patchset again. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] usb: add function usb_autopm_get_interface_upgrade
On 2015/5/26 22:12, Greg Kroah-Hartman wrote: On Tue, May 26, 2015 at 10:19:57AM +0800, Zhang, Yanmin wrote: Some usb driver has a specific requirement. Their critical functions might be called under both atomic environment and non-atomic environment. If it's under atomic environment, the driver can wake up the device by calling pm_runtime_get_sync directly. If it's under non-atomic environment, the function's caller need wake up the device before the function accesses the device. The patch adds usb_autopm_get_interface_upgrade, a new function to support above capability. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/usb/core/driver.c | 53 +++ include/linux/usb.h | 3 +++ 2 files changed, 56 insertions(+) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 818369a..d07fd8d 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1684,6 +1684,59 @@ int usb_autopm_get_interface(struct usb_interface *intf) EXPORT_SYMBOL_GPL(usb_autopm_get_interface); /** + * usb_autopm_get_interface_retry - increment a USB interface's PM-usage counter + * @intf: the usb_interface whose counter should be incremented Your emails are obviously corrupted and can't be applied.. Please fix up and resend all of them. Sorry. I didn't send patches for a long time. I would resend the patchset again. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 1/3] usb: add function usb_autopm_get_interface_upgrade
Some usb driver has a specific requirement. Their critical functions might be called under both atomic environment and non-atomic environment. If it's under atomic environment, the driver can wake up the device by calling pm_runtime_get_sync directly. If it's under non-atomic environment, the function's caller need wake up the device before the function accesses the device. The patch adds usb_autopm_get_interface_upgrade, a new function to support above capability. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/usb/core/driver.c | 54 +++ include/linux/usb.h | 3 +++ 2 files changed, 57 insertions(+) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 818369a..5d6f9ee 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1684,6 +1684,60 @@ int usb_autopm_get_interface(struct usb_interface *intf) EXPORT_SYMBOL_GPL(usb_autopm_get_interface); /** + * usb_autopm_get_interface_upgrade - increment a USB interface's PM-usage + * counter + * @intf: the usb_interface whose counter should be incremented + * + * This routine should be called by an interface driver when it wants to + * use @intf and needs to guarantee that it is not suspended. In addition, + * the routine prevents @intf from being autosuspended subsequently. (Note + * that this will not prevent suspend events originating in the PM core.) + * This prevention will persist until usb_autopm_put_interface() is called + * or @intf is unbound. A typical example would be a character-device + * driver when its device file is opened. + * + * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade + * is more careful when resuming the device. + * 1) The caller's caller already resumes resume the device and hold spinlocks. + * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync; + * 2) The caller's caller doesn't resume the device. + * usb_autopm_get_interface_upgrade has to resume the device before going ahead. + * + * @intf's usage counter is incremented to prevent subsequent autosuspends. + * However if the autoresume fails then the counter is re-decremented. + * + * This routine can run only in process context. + * + * Return: 0 on success. + */ +int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ +intstatus = 0; + +pm_runtime_get(intf-dev); +if (!pm_runtime_active(intf-dev)) { +/* If not active, next _get_sync wakes device up*/ +status = pm_runtime_get_sync(intf-dev); +/* + * If it's active, next _put_sync wouldn't + * really put it to sleep as the 1st _get + * keeps the device active. + */ +pm_runtime_put_sync(intf-dev); +if (status 0) +pm_runtime_put(intf-dev); +} else +atomic_inc(intf-pm_usage_cnt); +dev_vdbg(intf-dev, %s: cnt %d - %d\n, +__func__, atomic_read(intf-dev.power.usage_count), +status); +if (status 0) +status = 0; +return status; +} +EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade); + +/** * usb_autopm_get_interface_async - increment a USB interface's PM-usage counter * @intf: the usb_interface whose counter should be incremented * diff --git a/include/linux/usb.h b/include/linux/usb.h index 447fe29..0a8a44a 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev); extern void usb_disable_autosuspend(struct usb_device *udev); extern int usb_autopm_get_interface(struct usb_interface *intf); +extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf); extern void usb_autopm_put_interface(struct usb_interface *intf); extern int usb_autopm_get_interface_async(struct usb_interface *intf); extern void usb_autopm_put_interface_async(struct usb_interface *intf); @@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device *udev) static inline int usb_autopm_get_interface(struct usb_interface *intf) { return 0; } +static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ return 0; } static inline int usb_autopm_get_interface_async(struct usb_interface *intf) { return 0; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write
Resend as V1 has email format issue. Sorry for bothering. --- There is a scenario about cdc-acm utilization.Application opens n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device. The application configures cdc-adm tty to n_gsm tty as ldisc tty. n_gsm=cdc-acm=xhci driver acm_tty_write can be called from n_gsm driver by ldisc connection, and from application when application opens cdc-acm tty directly. acm_tty_write wakes up the device by calling usb_autopm_get_interface_async, which calls pm_runtime_get. However, pm_runtime_get can't wake up the device before returning as it's an async wake up. Then, acm_tty_write might access the device when it is off. The patchset fixes it by: 1) add a new function usb_autopm_get_interface_upgrade to deal with above 2 requirements; 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 3/3] n_gsm: wake up ldisc tty before using it
Wake up ldisc device before calling its driver to access the device. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/tty/n_gsm.c | 40 +++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 2c34c32..40671fa 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -62,6 +62,7 @@ #include linux/netdevice.h #include linux/etherdevice.h #include linux/gsmmux.h +#include linux/pm_runtime.h static int debug; module_param(debug, int, 0600); @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len) return olen; } +static int pm_runtime_get_sync_tty(struct tty_struct *tty) +{ +int ret = 0; + +/*Wakeup parent as tty itself doesn't enable runtime*/ +if (tty-dev-parent) +ret = pm_runtime_get_sync(tty-dev-parent); + +return ret; +} + +static int pm_runtime_put_tty(struct tty_struct *tty) +{ +int ret = 0; + +if (tty-dev-parent) +ret = pm_runtime_put(tty-dev-parent); + +return ret; +} + /** *gsm_send-send a control frame *@gsm: our GSM mux @@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci) return; dlci-retries = gsm-n2; dlci-state = DLCI_OPENING; +pm_runtime_get_sync_tty(gsm-tty); gsm_command(dlci-gsm, dlci-addr, SABM|PF); +pm_runtime_put_tty(gsm-tty); mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100); } @@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci) return; dlci-retries = gsm-n2; dlci-state = DLCI_CLOSING; +pm_runtime_get_sync_tty(gsm-tty); gsm_command(dlci-gsm, dlci-addr, DISC|PF); +pm_runtime_put_tty(gsm-tty); mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100); } @@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, const unsigned char *cp, flags = *f++; switch (flags) { case TTY_NORMAL: +pm_runtime_get_sync_tty(gsm-tty); gsm-receive(gsm, *dp); +pm_runtime_put_tty(gsm-tty); break; case TTY_OVERRUN: case TTY_BREAK: @@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) { struct gsm_dlci *dlci = tty-driver_data; struct tty_port *port = dlci-port; +int ret; port-count++; tty_port_tty_set(port, tty); @@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) /* Start sending off SABM messages */ gsm_dlci_begin_open(dlci); /* And wait for virtual carrier */ -return tty_port_block_til_ready(port, tty, filp); +pm_runtime_get_sync_tty(dlci-gsm-tty); +ret = tty_port_block_til_ready(port, tty, filp); +pm_runtime_put_tty(dlci-gsm-tty); + +return ret; } static void gsmtty_close(struct tty_struct *tty, struct file *filp) @@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct file *filp) gsm = dlci-gsm; if (tty_port_close_start(dlci-port, tty, filp) == 0) return; +pm_runtime_get_sync_tty(gsm-tty); gsm_dlci_begin_close(dlci); if (test_bit(ASYNCB_INITIALIZED, dlci-port.flags)) { if (C_HUPCL(tty)) tty_port_lower_dtr_rts(dlci-port); } +pm_runtime_put_tty(gsm-tty); + tty_port_close_end(dlci-port, tty); tty_port_tty_set(dlci-port, NULL); return; @@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const unsigned char *buf, struct gsm_dlci *dlci = tty-driver_data; if (dlci-state == DLCI_CLOSED) return -EINVAL; +pm_runtime_get_sync_tty(dlci-gsm-tty); /* Stuff the bytes into the fifo queue */ sent = kfifo_in_locked(dlci-fifo, buf, len, dlci-lock); /* Need to kick the channel */ gsm_dlci_data_kick(dlci); +pm_runtime_put_tty(dlci-gsm-tty); return sent; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in acm_tty_write
acm device might be used as ldisc device by n_gsm driver. gsmtty_write and other gsm functions calls acm_tty_write indirectly while they holds spinlocks. Meanwhile, application might access ACM tty device directly. Here we choose to call usb_autopm_get_interface_upgrade instead of usb_autopm_get_interface_async to make sure above 2 scenarios can work well. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/usb/class/cdc-acm.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 5c8f581..6ad85a3 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty, dev_vdbg(acm-data-dev, %s - count %d\n, __func__, count); +stat = usb_autopm_get_interface_upgrade(acm-control); +if (stat) +return stat; + spin_lock_irqsave(acm-write_lock, flags); wbn = acm_wb_alloc(acm); if (wbn 0) { spin_unlock_irqrestore(acm-write_lock, flags); +usb_autopm_put_interface(acm-control); return 0; } wb = acm-wb[wbn]; @@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty, if (!acm-dev) { wb-use = 0; spin_unlock_irqrestore(acm-write_lock, flags); +usb_autopm_put_interface(acm-control); return -ENODEV; } @@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty, memcpy(wb-buf, buf, count); wb-len = count; -stat = usb_autopm_get_interface_async(acm-control); -if (stat) { -wb-use = 0; -spin_unlock_irqrestore(acm-write_lock, flags); -return stat; -} - if (acm-susp_count) { usb_anchor_urb(wb-urb, acm-delayed); spin_unlock_irqrestore(acm-write_lock, flags); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 3/3] n_gsm: wake up ldisc tty before using it
On 2015/5/27 11:02, Greg KH wrote: On Wed, May 27, 2015 at 10:50:01AM +0800, Zhang, Yanmin wrote: Wake up ldisc device before calling its driver to access the device. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/tty/n_gsm.c | 40 +++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 2c34c32..40671fa 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -62,6 +62,7 @@ #include linux/netdevice.h #include linux/etherdevice.h #include linux/gsmmux.h +#include linux/pm_runtime.h static int debug; module_param(debug, int, 0600); @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len) return olen; } +static int pm_runtime_get_sync_tty(struct tty_struct *tty) +{ +int ret = 0; + +/*Wakeup parent as tty itself doesn't enable runtime*/ No spaces in your comment? I will add it. Anyway, this is corrupted and can't be applied, please fix up your email client and try it again... I check the patch by scripts/checkpatch.pl and everything seems good. I use Thunderbird 31.7.0, the latest. It auto updates to the latest version. Perhaps some new config options changed something. It seems email client converts some tab to space automatically. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 1/3] usb: add function usb_autopm_get_interface_upgrade
From: Zhang Yanmin yanmin.zh...@intel.com Some usb driver has a specific requirement. Their critical functions might be called under both atomic environment and non-atomic environment. If it's under atomic environment, the driver can wake up the device by calling pm_runtime_get_sync directly. If it's under non-atomic environment, the function's caller need wake up the device before the function accesses the device. The patch adds usb_autopm_get_interface_upgrade, a new function to support above capability. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/usb/core/driver.c | 54 +++ include/linux/usb.h | 3 +++ 2 files changed, 57 insertions(+) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 818369a..5d6f9ee 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1684,6 +1684,60 @@ int usb_autopm_get_interface(struct usb_interface *intf) EXPORT_SYMBOL_GPL(usb_autopm_get_interface); /** + * usb_autopm_get_interface_upgrade - increment a USB interface's PM-usage + * counter + * @intf: the usb_interface whose counter should be incremented + * + * This routine should be called by an interface driver when it wants to + * use @intf and needs to guarantee that it is not suspended. In addition, + * the routine prevents @intf from being autosuspended subsequently. (Note + * that this will not prevent suspend events originating in the PM core.) + * This prevention will persist until usb_autopm_put_interface() is called + * or @intf is unbound. A typical example would be a character-device + * driver when its device file is opened. + * + * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade + * is more careful when resuming the device. + * 1) The caller's caller already resumes resume the device and hold spinlocks. + * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync; + * 2) The caller's caller doesn't resume the device. + * usb_autopm_get_interface_upgrade has to resume the device before going ahead. + * + * @intf's usage counter is incremented to prevent subsequent autosuspends. + * However if the autoresume fails then the counter is re-decremented. + * + * This routine can run only in process context. + * + * Return: 0 on success. + */ +int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ + int status = 0; + + pm_runtime_get(intf-dev); + if (!pm_runtime_active(intf-dev)) { + /* If not active, next _get_sync wakes device up*/ + status = pm_runtime_get_sync(intf-dev); + /* +* If it's active, next _put_sync wouldn't +* really put it to sleep as the 1st _get +* keeps the device active. +*/ + pm_runtime_put_sync(intf-dev); + if (status 0) + pm_runtime_put(intf-dev); + } else + atomic_inc(intf-pm_usage_cnt); + dev_vdbg(intf-dev, %s: cnt %d - %d\n, + __func__, atomic_read(intf-dev.power.usage_count), + status); + if (status 0) + status = 0; + return status; +} +EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade); + +/** * usb_autopm_get_interface_async - increment a USB interface's PM-usage counter * @intf: the usb_interface whose counter should be incremented * diff --git a/include/linux/usb.h b/include/linux/usb.h index 447fe29..0a8a44a 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev); extern void usb_disable_autosuspend(struct usb_device *udev); extern int usb_autopm_get_interface(struct usb_interface *intf); +extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf); extern void usb_autopm_put_interface(struct usb_interface *intf); extern int usb_autopm_get_interface_async(struct usb_interface *intf); extern void usb_autopm_put_interface_async(struct usb_interface *intf); @@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device *udev) static inline int usb_autopm_get_interface(struct usb_interface *intf) { return 0; } +static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ return 0; } static inline int usb_autopm_get_interface_async(struct usb_interface *intf) { return 0; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 3/3] n_gsm: wake up ldisc tty before using it
From: Zhang Yanmin yanmin.zh...@intel.com Wake up ldisc device before calling its driver to access the device. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/tty/n_gsm.c | 40 +++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 2c34c32..f887df6 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -62,6 +62,7 @@ #include linux/netdevice.h #include linux/etherdevice.h #include linux/gsmmux.h +#include linux/pm_runtime.h static int debug; module_param(debug, int, 0600); @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len) return olen; } +static int pm_runtime_get_sync_tty(struct tty_struct *tty) +{ + int ret = 0; + + /* Wakeup parent as tty itself doesn't enable runtime */ + if (tty-dev-parent) + ret = pm_runtime_get_sync(tty-dev-parent); + + return ret; +} + +static int pm_runtime_put_tty(struct tty_struct *tty) +{ + int ret = 0; + + if (tty-dev-parent) + ret = pm_runtime_put(tty-dev-parent); + + return ret; +} + /** * gsm_send- send a control frame * @gsm: our GSM mux @@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci) return; dlci-retries = gsm-n2; dlci-state = DLCI_OPENING; + pm_runtime_get_sync_tty(gsm-tty); gsm_command(dlci-gsm, dlci-addr, SABM|PF); + pm_runtime_put_tty(gsm-tty); mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100); } @@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci) return; dlci-retries = gsm-n2; dlci-state = DLCI_CLOSING; + pm_runtime_get_sync_tty(gsm-tty); gsm_command(dlci-gsm, dlci-addr, DISC|PF); + pm_runtime_put_tty(gsm-tty); mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100); } @@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, const unsigned char *cp, flags = *f++; switch (flags) { case TTY_NORMAL: + pm_runtime_get_sync_tty(gsm-tty); gsm-receive(gsm, *dp); + pm_runtime_put_tty(gsm-tty); break; case TTY_OVERRUN: case TTY_BREAK: @@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) { struct gsm_dlci *dlci = tty-driver_data; struct tty_port *port = dlci-port; + int ret; port-count++; tty_port_tty_set(port, tty); @@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) /* Start sending off SABM messages */ gsm_dlci_begin_open(dlci); /* And wait for virtual carrier */ - return tty_port_block_til_ready(port, tty, filp); + pm_runtime_get_sync_tty(dlci-gsm-tty); + ret = tty_port_block_til_ready(port, tty, filp); + pm_runtime_put_tty(dlci-gsm-tty); + + return ret; } static void gsmtty_close(struct tty_struct *tty, struct file *filp) @@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct file *filp) gsm = dlci-gsm; if (tty_port_close_start(dlci-port, tty, filp) == 0) return; + pm_runtime_get_sync_tty(gsm-tty); gsm_dlci_begin_close(dlci); if (test_bit(ASYNCB_INITIALIZED, dlci-port.flags)) { if (C_HUPCL(tty)) tty_port_lower_dtr_rts(dlci-port); } + pm_runtime_put_tty(gsm-tty); + tty_port_close_end(dlci-port, tty); tty_port_tty_set(dlci-port, NULL); return; @@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const unsigned char *buf, struct gsm_dlci *dlci = tty-driver_data; if (dlci-state == DLCI_CLOSED) return -EINVAL; + pm_runtime_get_sync_tty(dlci-gsm-tty); /* Stuff the bytes into the fifo queue */ sent = kfifo_in_locked(dlci-fifo, buf, len, dlci-lock); /* Need to kick the channel */ gsm_dlci_data_kick(dlci); + pm_runtime_put_tty(dlci-gsm-tty); return sent; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in acm_tty_write
From: Zhang Yanmin yanmin.zh...@intel.com acm device might be used as ldisc device by n_gsm driver. gsmtty_write and other gsm functions calls acm_tty_write indirectly while they holds spinlocks. Meanwhile, application might access ACM tty device directly. Here we choose to call usb_autopm_get_interface_upgrade instead of usb_autopm_get_interface_async to make sure above 2 scenarios can work well. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/usb/class/cdc-acm.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 5c8f581..6ad85a3 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty, dev_vdbg(acm-data-dev, %s - count %d\n, __func__, count); + stat = usb_autopm_get_interface_upgrade(acm-control); + if (stat) + return stat; + spin_lock_irqsave(acm-write_lock, flags); wbn = acm_wb_alloc(acm); if (wbn 0) { spin_unlock_irqrestore(acm-write_lock, flags); + usb_autopm_put_interface(acm-control); return 0; } wb = acm-wb[wbn]; @@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty, if (!acm-dev) { wb-use = 0; spin_unlock_irqrestore(acm-write_lock, flags); + usb_autopm_put_interface(acm-control); return -ENODEV; } @@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty, memcpy(wb-buf, buf, count); wb-len = count; - stat = usb_autopm_get_interface_async(acm-control); - if (stat) { - wb-use = 0; - spin_unlock_irqrestore(acm-write_lock, flags); - return stat; - } - if (acm-susp_count) { usb_anchor_urb(wb-urb, acm-delayed); spin_unlock_irqrestore(acm-write_lock, flags); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write
Resend as V1/V2 have email format issue. Sorry for bothering. I use Thunderbird. It has no a button to enable LKML email simply. :) V3: Change email config to resend. Add a space in comment. --- There is a scenario about cdc-acm utilization.Application opens n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device. The application configures cdc-adm tty to n_gsm tty as ldisc tty. n_gsm=cdc-acm=xhci driver acm_tty_write can be called from n_gsm driver by ldisc connection, and from application when application opens cdc-acm tty directly. acm_tty_write wakes up the device by calling usb_autopm_get_interface_async, which calls pm_runtime_get. However, pm_runtime_get can't wake up the device before returning as it's an async wake up. Then, acm_tty_write might access the device when it is off. The patchset fixes it by: 1) add a new function usb_autopm_get_interface_upgrade to deal with above 2 requirements; 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write
On 2015/5/27 12:13, Zhang, Yanmin wrote: Resend as V1/V2 have email format issue. Sorry for bothering. Greg, We have to abandon this patchset. Zhuang Jin Can, a USB expert, reviewed the patches. He says acm_tty_write already considers it carefully. acm_tty_write puts the urb to a delayed queue when acm-susp_count is not 0. acm_suspend adds 1 to acm-susp_count and acm_resume decreases 1 from it . Sorry for the bothering. Also thank Jin Can for the comments. Yanmin I use Thunderbird. It has no a button to enable LKML email simply. :) V3: Change email config to resend. Add a space in comment. --- There is a scenario about cdc-acm utilization.Application opens n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device. The application configures cdc-adm tty to n_gsm tty as ldisc tty. n_gsm=cdc-acm=xhci driver acm_tty_write can be called from n_gsm driver by ldisc connection, and from application when application opens cdc-acm tty directly. acm_tty_write wakes up the device by calling usb_autopm_get_interface_async, which calls pm_runtime_get. However, pm_runtime_get can't wake up the device before returning as it's an async wake up. Then, acm_tty_write might access the device when it is off. The patchset fixes it by: 1) add a new function usb_autopm_get_interface_upgrade to deal with above 2 requirements; 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] n_gsm: wake up ldisc tty before using it
Wake up ldisc device before calling its driver to access the device. Signed-off-by: Zhang Yanmin --- drivers/tty/n_gsm.c | 40 +++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 2c34c32..40671fa 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -62,6 +62,7 @@ #include #include #include +#include static int debug; module_param(debug, int, 0600); @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len) return olen; } +static int pm_runtime_get_sync_tty(struct tty_struct *tty) +{ +int ret = 0; + +/*Wakeup parent as tty itself doesn't enable runtime*/ +if (tty->dev->parent) +ret = pm_runtime_get_sync(tty->dev->parent); + +return ret; +} + +static int pm_runtime_put_tty(struct tty_struct *tty) +{ +int ret = 0; + +if (tty->dev->parent) +ret = pm_runtime_put(tty->dev->parent); + +return ret; +} + /** *gsm_send-send a control frame *@gsm: our GSM mux @@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci) return; dlci->retries = gsm->n2; dlci->state = DLCI_OPENING; +pm_runtime_get_sync_tty(gsm->tty); gsm_command(dlci->gsm, dlci->addr, SABM|PF); +pm_runtime_put_tty(gsm->tty); mod_timer(>t1, jiffies + gsm->t1 * HZ / 100); } @@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci) return; dlci->retries = gsm->n2; dlci->state = DLCI_CLOSING; +pm_runtime_get_sync_tty(gsm->tty); gsm_command(dlci->gsm, dlci->addr, DISC|PF); +pm_runtime_put_tty(gsm->tty); mod_timer(>t1, jiffies + gsm->t1 * HZ / 100); } @@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, const unsigned char *cp, flags = *f++; switch (flags) { case TTY_NORMAL: +pm_runtime_get_sync_tty(gsm->tty); gsm->receive(gsm, *dp); +pm_runtime_put_tty(gsm->tty); break; case TTY_OVERRUN: case TTY_BREAK: @@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) { struct gsm_dlci *dlci = tty->driver_data; struct tty_port *port = >port; +int ret; port->count++; tty_port_tty_set(port, tty); @@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) /* Start sending off SABM messages */ gsm_dlci_begin_open(dlci); /* And wait for virtual carrier */ -return tty_port_block_til_ready(port, tty, filp); +pm_runtime_get_sync_tty(dlci->gsm->tty); +ret = tty_port_block_til_ready(port, tty, filp); +pm_runtime_put_tty(dlci->gsm->tty); + +return ret; } static void gsmtty_close(struct tty_struct *tty, struct file *filp) @@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct file *filp) gsm = dlci->gsm; if (tty_port_close_start(>port, tty, filp) == 0) return; +pm_runtime_get_sync_tty(gsm->tty); gsm_dlci_begin_close(dlci); if (test_bit(ASYNCB_INITIALIZED, >port.flags)) { if (C_HUPCL(tty)) tty_port_lower_dtr_rts(>port); } +pm_runtime_put_tty(gsm->tty); + tty_port_close_end(>port, tty); tty_port_tty_set(>port, NULL); return; @@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const unsigned char *buf, struct gsm_dlci *dlci = tty->driver_data; if (dlci->state == DLCI_CLOSED) return -EINVAL; +pm_runtime_get_sync_tty(dlci->gsm->tty); /* Stuff the bytes into the fifo queue */ sent = kfifo_in_locked(dlci->fifo, buf, len, >lock); /* Need to kick the channel */ gsm_dlci_data_kick(dlci); +pm_runtime_put_tty(dlci->gsm->tty); return sent; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in, acm_tty_write
acm device might be used as ldisc device by n_gsm driver. gsmtty_write and other gsm functions calls acm_tty_write indirectly while they holds spinlocks. Meanwhile, application might access ACM tty device directly. Here we choose to call usb_autopm_get_interface_upgrade instead of usb_autopm_get_interface_async to make sure above 2 scenarios can work well. Signed-off-by: Zhang Yanmin --- --- drivers/usb/class/cdc-acm.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 5c8f581..6ad85a3 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty, dev_vdbg(>data->dev, "%s - count %d\n", __func__, count); + stat = usb_autopm_get_interface_upgrade(acm->control); + if (stat) + return stat; + spin_lock_irqsave(>write_lock, flags); wbn = acm_wb_alloc(acm); if (wbn < 0) { spin_unlock_irqrestore(>write_lock, flags); + usb_autopm_put_interface(acm->control); return 0; } wb = >wb[wbn]; @@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty, if (!acm->dev) { wb->use = 0; spin_unlock_irqrestore(>write_lock, flags); + usb_autopm_put_interface(acm->control); return -ENODEV; } @@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty, memcpy(wb->buf, buf, count); wb->len = count; - stat = usb_autopm_get_interface_async(acm->control); - if (stat) { - wb->use = 0; - spin_unlock_irqrestore(>write_lock, flags); - return stat; - } - if (acm->susp_count) { usb_anchor_urb(wb->urb, >delayed); spin_unlock_irqrestore(>write_lock, flags); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] usb: add function usb_autopm_get_interface_upgrade
Some usb driver has a specific requirement. Their critical functions might be called under both atomic environment and non-atomic environment. If it's under atomic environment, the driver can wake up the device by calling pm_runtime_get_sync directly. If it's under non-atomic environment, the function's caller need wake up the device before the function accesses the device. The patch adds usb_autopm_get_interface_upgrade, a new function to support above capability. Signed-off-by: Zhang Yanmin --- drivers/usb/core/driver.c | 53 +++ include/linux/usb.h | 3 +++ 2 files changed, 56 insertions(+) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 818369a..d07fd8d 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1684,6 +1684,59 @@ int usb_autopm_get_interface(struct usb_interface *intf) EXPORT_SYMBOL_GPL(usb_autopm_get_interface); /** + * usb_autopm_get_interface_retry - increment a USB interface's PM-usage counter + * @intf: the usb_interface whose counter should be incremented + * + * This routine should be called by an interface driver when it wants to + * use @intf and needs to guarantee that it is not suspended. In addition, + * the routine prevents @intf from being autosuspended subsequently. (Note + * that this will not prevent suspend events originating in the PM core.) + * This prevention will persist until usb_autopm_put_interface() is called + * or @intf is unbound. A typical example would be a character-device + * driver when its device file is opened. + * + * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade + * is more careful when resuming the device. + * 1) The caller's caller already resumes resume the device and hold spinlocks. + * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync; + * 2) The caller's caller doesn't resume the device. + * usb_autopm_get_interface_upgrade has to resume the device before going ahead. + * + * @intf's usage counter is incremented to prevent subsequent autosuspends. + * However if the autoresume fails then the counter is re-decremented. + * + * This routine can run only in process context. + * + * Return: 0 on success. + */ +int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ +intstatus = 0; + +pm_runtime_get(>dev); +if (!pm_runtime_active(>dev)) { +/* If not active, next _get_sync wakes device up*/ +status = pm_runtime_get_sync(>dev); +/* + * If it's active, next _put_sync wouldn't + * really put it to sleep as the 1st _get + * keeps the device active. + */ +pm_runtime_put_sync(>dev); +if (status < 0) +pm_runtime_put(>dev); +} else +atomic_inc(>pm_usage_cnt); +dev_vdbg(>dev, "%s: cnt %d -> %d\n", +__func__, atomic_read(>dev.power.usage_count), +status); +if (status > 0) +status = 0; +return status; +} +EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade); + +/** * usb_autopm_get_interface_async - increment a USB interface's PM-usage counter * @intf: the usb_interface whose counter should be incremented * diff --git a/include/linux/usb.h b/include/linux/usb.h index 447fe29..0a8a44a 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev); extern void usb_disable_autosuspend(struct usb_device *udev); extern int usb_autopm_get_interface(struct usb_interface *intf); +extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf); extern void usb_autopm_put_interface(struct usb_interface *intf); extern int usb_autopm_get_interface_async(struct usb_interface *intf); extern void usb_autopm_put_interface_async(struct usb_interface *intf); @@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device *udev) static inline int usb_autopm_get_interface(struct usb_interface *intf) { return 0; } +static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ return 0; } static inline int usb_autopm_get_interface_async(struct usb_interface *intf) { return 0; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write
There is a scenario about cdc-acm utilization.Application opens n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device. The application configures cdc-adm tty to n_gsm tty as ldisc tty. n_gsm=>cdc-acm=>xhci driver acm_tty_write can be called from n_gsm driver by ldisc connection, and from application when application opens cdc-acm tty directly. acm_tty_write wakes up the device by calling usb_autopm_get_interface_async, which calls pm_runtime_get. However, pm_runtime_get can't wake up the device before returning as it's an async wake up. Then, acm_tty_write might access the device when it is off. The patchset fixes it by: 1) add a new function usb_autopm_get_interface_upgrade to deal with above 2 requirements; 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] n_gsm: wake up ldisc tty before using it
Wake up ldisc device before calling its driver to access the device. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/tty/n_gsm.c | 40 +++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 2c34c32..40671fa 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -62,6 +62,7 @@ #include linux/netdevice.h #include linux/etherdevice.h #include linux/gsmmux.h +#include linux/pm_runtime.h static int debug; module_param(debug, int, 0600); @@ -555,6 +556,27 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int len) return olen; } +static int pm_runtime_get_sync_tty(struct tty_struct *tty) +{ +int ret = 0; + +/*Wakeup parent as tty itself doesn't enable runtime*/ +if (tty-dev-parent) +ret = pm_runtime_get_sync(tty-dev-parent); + +return ret; +} + +static int pm_runtime_put_tty(struct tty_struct *tty) +{ +int ret = 0; + +if (tty-dev-parent) +ret = pm_runtime_put(tty-dev-parent); + +return ret; +} + /** *gsm_send-send a control frame *@gsm: our GSM mux @@ -1511,7 +1533,9 @@ static void gsm_dlci_begin_open(struct gsm_dlci *dlci) return; dlci-retries = gsm-n2; dlci-state = DLCI_OPENING; +pm_runtime_get_sync_tty(gsm-tty); gsm_command(dlci-gsm, dlci-addr, SABM|PF); +pm_runtime_put_tty(gsm-tty); mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100); } @@ -1533,7 +1557,9 @@ static void gsm_dlci_begin_close(struct gsm_dlci *dlci) return; dlci-retries = gsm-n2; dlci-state = DLCI_CLOSING; +pm_runtime_get_sync_tty(gsm-tty); gsm_command(dlci-gsm, dlci-addr, DISC|PF); +pm_runtime_put_tty(gsm-tty); mod_timer(dlci-t1, jiffies + gsm-t1 * HZ / 100); } @@ -2286,7 +2312,9 @@ static void gsmld_receive_buf(struct tty_struct *tty, const unsigned char *cp, flags = *f++; switch (flags) { case TTY_NORMAL: +pm_runtime_get_sync_tty(gsm-tty); gsm-receive(gsm, *dp); +pm_runtime_put_tty(gsm-tty); break; case TTY_OVERRUN: case TTY_BREAK: @@ -2957,6 +2985,7 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) { struct gsm_dlci *dlci = tty-driver_data; struct tty_port *port = dlci-port; +int ret; port-count++; tty_port_tty_set(port, tty); @@ -2968,7 +2997,11 @@ static int gsmtty_open(struct tty_struct *tty, struct file *filp) /* Start sending off SABM messages */ gsm_dlci_begin_open(dlci); /* And wait for virtual carrier */ -return tty_port_block_til_ready(port, tty, filp); +pm_runtime_get_sync_tty(dlci-gsm-tty); +ret = tty_port_block_til_ready(port, tty, filp); +pm_runtime_put_tty(dlci-gsm-tty); + +return ret; } static void gsmtty_close(struct tty_struct *tty, struct file *filp) @@ -2986,11 +3019,14 @@ static void gsmtty_close(struct tty_struct *tty, struct file *filp) gsm = dlci-gsm; if (tty_port_close_start(dlci-port, tty, filp) == 0) return; +pm_runtime_get_sync_tty(gsm-tty); gsm_dlci_begin_close(dlci); if (test_bit(ASYNCB_INITIALIZED, dlci-port.flags)) { if (C_HUPCL(tty)) tty_port_lower_dtr_rts(dlci-port); } +pm_runtime_put_tty(gsm-tty); + tty_port_close_end(dlci-port, tty); tty_port_tty_set(dlci-port, NULL); return; @@ -3012,10 +3048,12 @@ static int gsmtty_write(struct tty_struct *tty, const unsigned char *buf, struct gsm_dlci *dlci = tty-driver_data; if (dlci-state == DLCI_CLOSED) return -EINVAL; +pm_runtime_get_sync_tty(dlci-gsm-tty); /* Stuff the bytes into the fifo queue */ sent = kfifo_in_locked(dlci-fifo, buf, len, dlci-lock); /* Need to kick the channel */ gsm_dlci_data_kick(dlci); +pm_runtime_put_tty(dlci-gsm-tty); return sent; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] cdc-acm: call usb_autopm_get_interface_upgrade in, acm_tty_write
acm device might be used as ldisc device by n_gsm driver. gsmtty_write and other gsm functions calls acm_tty_write indirectly while they holds spinlocks. Meanwhile, application might access ACM tty device directly. Here we choose to call usb_autopm_get_interface_upgrade instead of usb_autopm_get_interface_async to make sure above 2 scenarios can work well. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- --- drivers/usb/class/cdc-acm.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 5c8f581..6ad85a3 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -689,10 +689,15 @@ static int acm_tty_write(struct tty_struct *tty, dev_vdbg(acm-data-dev, %s - count %d\n, __func__, count); + stat = usb_autopm_get_interface_upgrade(acm-control); + if (stat) + return stat; + spin_lock_irqsave(acm-write_lock, flags); wbn = acm_wb_alloc(acm); if (wbn 0) { spin_unlock_irqrestore(acm-write_lock, flags); + usb_autopm_put_interface(acm-control); return 0; } wb = acm-wb[wbn]; @@ -700,6 +705,7 @@ static int acm_tty_write(struct tty_struct *tty, if (!acm-dev) { wb-use = 0; spin_unlock_irqrestore(acm-write_lock, flags); + usb_autopm_put_interface(acm-control); return -ENODEV; } @@ -708,13 +714,6 @@ static int acm_tty_write(struct tty_struct *tty, memcpy(wb-buf, buf, count); wb-len = count; - stat = usb_autopm_get_interface_async(acm-control); - if (stat) { - wb-use = 0; - spin_unlock_irqrestore(acm-write_lock, flags); - return stat; - } - if (acm-susp_count) { usb_anchor_urb(wb-urb, acm-delayed); spin_unlock_irqrestore(acm-write_lock, flags); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] usb: add function usb_autopm_get_interface_upgrade
Some usb driver has a specific requirement. Their critical functions might be called under both atomic environment and non-atomic environment. If it's under atomic environment, the driver can wake up the device by calling pm_runtime_get_sync directly. If it's under non-atomic environment, the function's caller need wake up the device before the function accesses the device. The patch adds usb_autopm_get_interface_upgrade, a new function to support above capability. Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com --- drivers/usb/core/driver.c | 53 +++ include/linux/usb.h | 3 +++ 2 files changed, 56 insertions(+) diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 818369a..d07fd8d 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -1684,6 +1684,59 @@ int usb_autopm_get_interface(struct usb_interface *intf) EXPORT_SYMBOL_GPL(usb_autopm_get_interface); /** + * usb_autopm_get_interface_retry - increment a USB interface's PM-usage counter + * @intf: the usb_interface whose counter should be incremented + * + * This routine should be called by an interface driver when it wants to + * use @intf and needs to guarantee that it is not suspended. In addition, + * the routine prevents @intf from being autosuspended subsequently. (Note + * that this will not prevent suspend events originating in the PM core.) + * This prevention will persist until usb_autopm_put_interface() is called + * or @intf is unbound. A typical example would be a character-device + * driver when its device file is opened. + * + * Comparing with usb_autopm_get_interface, usb_autopm_get_interface_upgrade + * is more careful when resuming the device. + * 1) The caller's caller already resumes resume the device and hold spinlocks. + * usb_autopm_get_interface_upgrade couldn't call pm_runtime_get_sync; + * 2) The caller's caller doesn't resume the device. + * usb_autopm_get_interface_upgrade has to resume the device before going ahead. + * + * @intf's usage counter is incremented to prevent subsequent autosuspends. + * However if the autoresume fails then the counter is re-decremented. + * + * This routine can run only in process context. + * + * Return: 0 on success. + */ +int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ +intstatus = 0; + +pm_runtime_get(intf-dev); +if (!pm_runtime_active(intf-dev)) { +/* If not active, next _get_sync wakes device up*/ +status = pm_runtime_get_sync(intf-dev); +/* + * If it's active, next _put_sync wouldn't + * really put it to sleep as the 1st _get + * keeps the device active. + */ +pm_runtime_put_sync(intf-dev); +if (status 0) +pm_runtime_put(intf-dev); +} else +atomic_inc(intf-pm_usage_cnt); +dev_vdbg(intf-dev, %s: cnt %d - %d\n, +__func__, atomic_read(intf-dev.power.usage_count), +status); +if (status 0) +status = 0; +return status; +} +EXPORT_SYMBOL_GPL(usb_autopm_get_interface_upgrade); + +/** * usb_autopm_get_interface_async - increment a USB interface's PM-usage counter * @intf: the usb_interface whose counter should be incremented * diff --git a/include/linux/usb.h b/include/linux/usb.h index 447fe29..0a8a44a 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -663,6 +663,7 @@ extern void usb_enable_autosuspend(struct usb_device *udev); extern void usb_disable_autosuspend(struct usb_device *udev); extern int usb_autopm_get_interface(struct usb_interface *intf); +extern int usb_autopm_get_interface_upgrade(struct usb_interface *intf); extern void usb_autopm_put_interface(struct usb_interface *intf); extern int usb_autopm_get_interface_async(struct usb_interface *intf); extern void usb_autopm_put_interface_async(struct usb_interface *intf); @@ -683,6 +684,8 @@ static inline int usb_disable_autosuspend(struct usb_device *udev) static inline int usb_autopm_get_interface(struct usb_interface *intf) { return 0; } +static inline int usb_autopm_get_interface_upgrade(struct usb_interface *intf) +{ return 0; } static inline int usb_autopm_get_interface_async(struct usb_interface *intf) { return 0; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] cdc-acm: fix incorrect runtime wakeup in acm_tty_write
There is a scenario about cdc-acm utilization.Application opens n_gsm tty and cdc-acm tty. cdc-acm tty connects to xhci device. The application configures cdc-adm tty to n_gsm tty as ldisc tty. n_gsm=cdc-acm=xhci driver acm_tty_write can be called from n_gsm driver by ldisc connection, and from application when application opens cdc-acm tty directly. acm_tty_write wakes up the device by calling usb_autopm_get_interface_async, which calls pm_runtime_get. However, pm_runtime_get can't wake up the device before returning as it's an async wake up. Then, acm_tty_write might access the device when it is off. The patchset fixes it by: 1) add a new function usb_autopm_get_interface_upgrade to deal with above 2 requirements; 2) wake up device in n_gsm driver if n_gsm drivers calls cdc-acm driver; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] debugfs: keep the old valid mode value when no explicity specify it
On 2014/8/28 23:10, Greg KH wrote: On Thu, Aug 28, 2014 at 06:09:09PM +0800, Chen LinX wrote: From: "Chen, LinX" When mount debugfs with no mode specifed after it's mounted, the mount point mode will change to default mode(0700) even the mount operation was fail, this will cause some issues like can't get binder info in android. I don't understand, what did you do to get into this state? Greg, Thanks for your kind quick comments. The patch description can be improved. We hit the issue when debugging a UIWDT issue. Android framework has a good method to detect userspace hang and reports UIWDT issues. Android uses client/server model. Clients communicates with servers by binder. binder has debugfs interfaces. Some files show what threads are communicating with what other threads. If one thread is blocked for a long time, we can find the blocking chain from the binder info. Since the error dumping process has no root access, booting process changes debugfs mount dir mode to 0755. When UIWDT happens, the error dumping process can read the info. Unfortunately, some other scripts at booting try to mount debugfs for many times. No matter if the double mounting fails or succeeds, debugfs_parse_options changes the root inode's mode back to the default 0700. It means the effect of previous mode changing to 0755 is lost. At UIWDT, the dumping process can't save binder info to disk log files. And why does binder care about debugfs? Here we can keep the old valid mode if no explicity specify the mode value and also change the mode value even the mount fails if the mode value is specified. I can't parse this sentence :( Lin's patch checks if debugfs_mount_opts->mode is initiated. If it is already, the patch would bypass the reinitiation, so the old mode value is kept. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] debugfs: keep the old valid mode value when no explicity specify it
On 2014/8/28 23:10, Greg KH wrote: On Thu, Aug 28, 2014 at 06:09:09PM +0800, Chen LinX wrote: From: Chen, LinX linx.z.c...@intel.com When mount debugfs with no mode specifed after it's mounted, the mount point mode will change to default mode(0700) even the mount operation was fail, this will cause some issues like can't get binder info in android. I don't understand, what did you do to get into this state? Greg, Thanks for your kind quick comments. The patch description can be improved. We hit the issue when debugging a UIWDT issue. Android framework has a good method to detect userspace hang and reports UIWDT issues. Android uses client/server model. Clients communicates with servers by binder. binder has debugfs interfaces. Some files show what threads are communicating with what other threads. If one thread is blocked for a long time, we can find the blocking chain from the binder info. Since the error dumping process has no root access, booting process changes debugfs mount dir mode to 0755. When UIWDT happens, the error dumping process can read the info. Unfortunately, some other scripts at booting try to mount debugfs for many times. No matter if the double mounting fails or succeeds, debugfs_parse_options changes the root inode's mode back to the default 0700. It means the effect of previous mode changing to 0755 is lost. At UIWDT, the dumping process can't save binder info to disk log files. And why does binder care about debugfs? Here we can keep the old valid mode if no explicity specify the mode value and also change the mode value even the mount fails if the mode value is specified. I can't parse this sentence :( Lin's patch checks if debugfs_mount_opts-mode is initiated. If it is already, the patch would bypass the reinitiation, so the old mode value is kept. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] export the function kmap_flush_unused.
On 2014/8/11 19:54, Peter Zijlstra wrote: On Mon, Aug 11, 2014 at 01:26:45AM +, Sha, Ruibin wrote: Hi Chintan, Thank you very much for your timely and kindly response and comments. Here is more detail about our Scenario: We have a big driver on Android product. The driver allocates lots of DDR pages. When applications mmap a file exported from the driver, driver would mmap the pages to the application space, usually with uncachable prot. On ia32/x86_64 arch, we have to avoid page cache alias issue. When driver allocates the pages, it would change page original mapping in page table with uncachable prot. Sometimes, the allocated page was used by kmap/kunmap. After kunmap, the page is still mapped in KMAP space. The entries in KMAP page table are not cleaned up until a kernel thread flushes the freed KMAP pages(usually it is woken up by kunmap). It means the driver need force to flush the KMAP page table entries before mapping pages to application space to be used. Otherwise, there is a race to create cache alias. To resolve this issue, we need export function kmap_flush_unused as the driver is compiled as module. Then, the driver calls kmap_flush_unused if the allocated pages are in HIGHMEM and being used by kmap. A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Sorry, Peter. Ruibin is a new guy in LKML community. He uses outlook to send emails. He would improve that. That said, it sounds like you want set_memory_() to call kmap_flush_unused(). Because this race it not at all specific to your usage, it could happen to any set_memory_() site, right? No. set_memory_() assumes the memory is not in HIGHMEM. This scenario is driver allocates HIGHMEM pages, which are kmapped before. Kernel uses a lazy method when kunmap a HIGHMEM page. The pages are not unmapped from KMAP page table entries immediately. When next kmap calling uses the same entry, kernel would change pte. Or when change_page_attr_set_clr is called. Our big driver doesn't call change_page_attr_set_clr when mmap the pages with UNCACHABLE prot. It need call kmap_flush_unused directly after allocating HIGHMEM pages. Thanks for the kind comments. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] export the function kmap_flush_unused.
On 2014/8/11 19:54, Peter Zijlstra wrote: On Mon, Aug 11, 2014 at 01:26:45AM +, Sha, Ruibin wrote: Hi Chintan, Thank you very much for your timely and kindly response and comments. Here is more detail about our Scenario: We have a big driver on Android product. The driver allocates lots of DDR pages. When applications mmap a file exported from the driver, driver would mmap the pages to the application space, usually with uncachable prot. On ia32/x86_64 arch, we have to avoid page cache alias issue. When driver allocates the pages, it would change page original mapping in page table with uncachable prot. Sometimes, the allocated page was used by kmap/kunmap. After kunmap, the page is still mapped in KMAP space. The entries in KMAP page table are not cleaned up until a kernel thread flushes the freed KMAP pages(usually it is woken up by kunmap). It means the driver need force to flush the KMAP page table entries before mapping pages to application space to be used. Otherwise, there is a race to create cache alias. To resolve this issue, we need export function kmap_flush_unused as the driver is compiled as module. Then, the driver calls kmap_flush_unused if the allocated pages are in HIGHMEM and being used by kmap. A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? Sorry, Peter. Ruibin is a new guy in LKML community. He uses outlook to send emails. He would improve that. That said, it sounds like you want set_memory_() to call kmap_flush_unused(). Because this race it not at all specific to your usage, it could happen to any set_memory_() site, right? No. set_memory_() assumes the memory is not in HIGHMEM. This scenario is driver allocates HIGHMEM pages, which are kmapped before. Kernel uses a lazy method when kunmap a HIGHMEM page. The pages are not unmapped from KMAP page table entries immediately. When next kmap calling uses the same entry, kernel would change pte. Or when change_page_attr_set_clr is called. Our big driver doesn't call change_page_attr_set_clr when mmap the pages with UNCACHABLE prot. It need call kmap_flush_unused directly after allocating HIGHMEM pages. Thanks for the kind comments. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] export the function kmap_flush_unused.
On 2014/8/11 9:26, Sha, Ruibin wrote: Hi Chintan, Thank you very much for your timely and kindly response and comments. Here is more detail about our Scenario: We have a big driver on Android product. The driver allocates lots of DDR pages. When applications mmap a file exported from the driver, driver would mmap the pages to the application space, usually with uncachable prot. On ia32/x86_64 arch, we have to avoid page cache alias issue. When driver allocates the pages, it would change page original mapping in page table with uncachable prot. Sometimes, the allocated page was used by kmap/kunmap. After kunmap, the page is still mapped in KMAP space. The entries in KMAP page table are not cleaned up until a kernel thread flushes the freed KMAP pages(usually it is woken up by kunmap). It means the driver need force to flush the KMAP page table entries before mapping pages to application space to be used. Otherwise, there is a race to create cache alias. To resolve this issue, we need export function kmap_flush_unused as the driver is compiled as module. Then, the driver calls kmap_flush_unused if the allocated pages are in HIGHMEM and being used by kmap. Thanks again! Best Regards --- Sha, Rui bin ( Robin ) +86 13817890945 Android System Integration Shanghai -Original Message- From: Chintan Pandya [mailto:cpan...@codeaurora.org] Sent: Friday, August 8, 2014 9:40 PM To: Sha, Ruibin Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; m...@csn.ul.ie; a.p.zijls...@chello.nl; mgor...@suse.de; mi...@redhat.com; Zhang, Yanmin; He, Bo Subject: Re: [PATCH] export the function kmap_flush_unused. On 08/08/2014 02:46 PM, Sha, Ruibin wrote: export the function kmap_flush_unused. Scenario: When graphic driver need high memory spece, we use alloc_pages() to allocate. But if the allocated page has just been mapped in the KMAP space(like first kmap then kunmap) and no flush page happened on PKMAP, the page virtual address is not NULL.Then when we get that page and set page attribute like set_memory_uc and set_memory_wc, we hit error. Could you explain your scenario with more details ? set_memory_* should be applied on mapped address. And in attempt to map your page (which was just kmap and kunmap'ed), it will overwrite the previous mappings. Moreover, in my view, kmap_flush_unused is just helping us in keeping the cache clean for kmap virtual addresses if they are unmapped. Is it serving any more purpose here ? It depends on how to define 'clean' here. It resets pkmap_count[i] to 0, and cleans up page table entries used by PKMAP. Here, our scenario is caused by the late page table entry cleanup as driver need avoid page cache alias. fix: For that scenario,when we get the allocated page and its virtual address is not NULL, we would like first flush that page. So need export that function kmap_flush_unused. Signed-off-by: sha, ruibin --- mm/highmem.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/highmem.c b/mm/highmem.c index b32b70c..511299b 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -156,6 +156,7 @@ void kmap_flush_unused(void) flush_all_zero_pkmaps(); unlock_kmap(); } +EXPORT_SYMBOL(kmap_flush_unused); This symbol is already extern'ed. Is it not sufficient for your case ? We want to call it in driver module. extern is not enough. Thanks, Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] export the function kmap_flush_unused.
On 2014/8/11 9:26, Sha, Ruibin wrote: Hi Chintan, Thank you very much for your timely and kindly response and comments. Here is more detail about our Scenario: We have a big driver on Android product. The driver allocates lots of DDR pages. When applications mmap a file exported from the driver, driver would mmap the pages to the application space, usually with uncachable prot. On ia32/x86_64 arch, we have to avoid page cache alias issue. When driver allocates the pages, it would change page original mapping in page table with uncachable prot. Sometimes, the allocated page was used by kmap/kunmap. After kunmap, the page is still mapped in KMAP space. The entries in KMAP page table are not cleaned up until a kernel thread flushes the freed KMAP pages(usually it is woken up by kunmap). It means the driver need force to flush the KMAP page table entries before mapping pages to application space to be used. Otherwise, there is a race to create cache alias. To resolve this issue, we need export function kmap_flush_unused as the driver is compiled as module. Then, the driver calls kmap_flush_unused if the allocated pages are in HIGHMEM and being used by kmap. Thanks again! Best Regards --- Sha, Rui bin ( Robin ) +86 13817890945 Android System Integration Shanghai -Original Message- From: Chintan Pandya [mailto:cpan...@codeaurora.org] Sent: Friday, August 8, 2014 9:40 PM To: Sha, Ruibin Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; m...@csn.ul.ie; a.p.zijls...@chello.nl; mgor...@suse.de; mi...@redhat.com; Zhang, Yanmin; He, Bo Subject: Re: [PATCH] export the function kmap_flush_unused. On 08/08/2014 02:46 PM, Sha, Ruibin wrote: export the function kmap_flush_unused. Scenario: When graphic driver need high memory spece, we use alloc_pages() to allocate. But if the allocated page has just been mapped in the KMAP space(like first kmap then kunmap) and no flush page happened on PKMAP, the page virtual address is not NULL.Then when we get that page and set page attribute like set_memory_uc and set_memory_wc, we hit error. Could you explain your scenario with more details ? set_memory_* should be applied on mapped address. And in attempt to map your page (which was just kmap and kunmap'ed), it will overwrite the previous mappings. Moreover, in my view, kmap_flush_unused is just helping us in keeping the cache clean for kmap virtual addresses if they are unmapped. Is it serving any more purpose here ? It depends on how to define 'clean' here. It resets pkmap_count[i] to 0, and cleans up page table entries used by PKMAP. Here, our scenario is caused by the late page table entry cleanup as driver need avoid page cache alias. fix: For that scenario,when we get the allocated page and its virtual address is not NULL, we would like first flush that page. So need export that function kmap_flush_unused. Signed-off-by: sha, ruibin ruibin@intel.com --- mm/highmem.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/highmem.c b/mm/highmem.c index b32b70c..511299b 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -156,6 +156,7 @@ void kmap_flush_unused(void) flush_all_zero_pkmaps(); unlock_kmap(); } +EXPORT_SYMBOL(kmap_flush_unused); This symbol is already extern'ed. Is it not sufficient for your case ? We want to call it in driver module. extern is not enough. Thanks, Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/24 1:14, Mikulas Patocka wrote: On Wed, 23 Jul 2014, Alasdair G Kergon wrote: On Wed, Jul 23, 2014 at 08:16:58AM -0400, Mikulas Patocka wrote: So, it means that you do not use device mapper at all. So, why are you trying to change memory allocation in device mapper? So the *test* they run is asking device-mapper to briefly reserve a 64KB buffer when there is no data to report: The answer is not to run that pointless test:) And if a single 64KB allocation really is a big deal, then patch 'vold' in userspace so it doesn't ask for 64KB when it clearly doesn't need it! + int Devmapper::dumpState(SocketClient *c) { +char *buffer = (char *) malloc(1024 * 64); The code has just #define DEVMAPPER_BUFFER_SIZE 4096 for all the other dm ioctls it issues. Only use a larger value when it is needed i.e. if DM_BUFFER_FULL_FLAG gets set. Alasdair Device mapper shouldn't depend on allocation on any contiguous memory - it will fall back to vmalloc. I still can't believe that their suggested patch makes any difference. This pattern is being repeated over and over in the kernel - for example: if (PIDLIST_TOO_LARGE(count)) return vmalloc(count * sizeof(pid_t)); else return kmalloc(count * sizeof(pid_t), GFP_KERNEL); if (is_vmalloc_addr(p)) vfree(p); else kfree(p); - I think we should make two functions that do this operation (for example kvalloc and kvfree) and convert device mapper and other users to these functions. Then, other kernel subsystems can start to use them to fix memory fragmentation issues. Thank Mikulas and Alasdair. Before sending out the patch, we know the result. :) It's hard to balance between performance and stability. Anyway, we would try to change seq_read. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/24 1:14, Mikulas Patocka wrote: On Wed, 23 Jul 2014, Alasdair G Kergon wrote: On Wed, Jul 23, 2014 at 08:16:58AM -0400, Mikulas Patocka wrote: So, it means that you do not use device mapper at all. So, why are you trying to change memory allocation in device mapper? So the *test* they run is asking device-mapper to briefly reserve a 64KB buffer when there is no data to report: The answer is not to run that pointless test:) And if a single 64KB allocation really is a big deal, then patch 'vold' in userspace so it doesn't ask for 64KB when it clearly doesn't need it! + int Devmapper::dumpState(SocketClient *c) { +char *buffer = (char *) malloc(1024 * 64); The code has just #define DEVMAPPER_BUFFER_SIZE 4096 for all the other dm ioctls it issues. Only use a larger value when it is needed i.e. if DM_BUFFER_FULL_FLAG gets set. Alasdair Device mapper shouldn't depend on allocation on any contiguous memory - it will fall back to vmalloc. I still can't believe that their suggested patch makes any difference. This pattern is being repeated over and over in the kernel - for example: if (PIDLIST_TOO_LARGE(count)) return vmalloc(count * sizeof(pid_t)); else return kmalloc(count * sizeof(pid_t), GFP_KERNEL); if (is_vmalloc_addr(p)) vfree(p); else kfree(p); - I think we should make two functions that do this operation (for example kvalloc and kvfree) and convert device mapper and other users to these functions. Then, other kernel subsystems can start to use them to fix memory fragmentation issues. Thank Mikulas and Alasdair. Before sending out the patch, we know the result. :) It's hard to balance between performance and stability. Anyway, we would try to change seq_read. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/22 10:04, Alasdair G Kergon wrote: On Tue, Jul 22, 2014 at 02:23:52AM +0100, Alasdair G Kergon wrote: Unanswered. Let's ask the same question in a different way: A quick search for 'vold' returns: https://android.googlesource.com/platform/system/vold/ and the code there requests a fixed 64K allocation to hold the names of active volumes. So unlike libdevmapper-based applications where a smaller allocation is used at first and only extended if needed, Android just assumes that 64KB is enough for everyone. So is your claim that: 1. This 64KB allocation for the brief duration of the ioctl to store the names of active device-mapper volumes leads to memory problems? [Mustn't the system *already* be in a bad state if this pushes it over the limit?] It's a good question. 1) Usually, Android mobile runs for a long time. It's very command that users don't turn off the phones for months. Users might start lots of applications. memory is used up in the end. Kernel might recollect memory over and over again. 2) We never blames this Out of memory issue fully to DM. 3) We want to improve the OOM issue. and 2. The systems on which this memory shortage occurs have so many volumes (with long names?) that a smaller allocation would not suffice? 64K is small, comparing with 2GB, even 4GB total memory. However, this 64K by kmalloc has a strict criteria. It might be continuous physical memory and align with 64K. If memory is used up, freed memory is very fragmented. Such 64K criteria is a hard request. Usually, driver can allocate such memory at booting initialization. After that, it should allocate many sparse pages instead of continuous memory. Here 64K allocation is after booting. Thanks for the kind comments. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/22 9:23, Alasdair G Kergon wrote: On 2014/7/9 6:39, Mikulas Patocka wrote: Which ioctl with more than 16kB arguments do you use? Unanswered. Let's ask the same question in a different way: Please supply the output of these three commands on the real-world system on which you believe that that particular allocation is causing you a problem: dmsetup info -c dmsetup table dmsetup status Android doesn't include the command. We compiled lvm2-2_02_107.tar.gz and copy dmsetup to the board. But command reports No devices. # /data/bin/dmsetup info No devices found # cat /proc/devices Character devices: 1 mem 4 /dev/vc/0 4 tty 4 ttyS 4 ttyMFD 5 /dev/tty 5 /dev/console 5 /dev/ptmx 7 vcs 10 misc 13 input 21 sg 29 fb 81 video4linux 89 i2c 108 ppp 116 alsa 128 ptm 136 pts 153 spi 166 ttyACM 180 usb 188 ttyUSB 189 usb_device 202 cpu/msr 203 cpu/cpuid 226 drm 241 mdm_ctrl 242 sep54 243 roccat 244 hidraw 245 ttyGS 246 usbmon 247 ttyPTI 248 gsmtty 249 bsg 250 iio 251 ptp 252 pps 253 media 254 rtc Block devices: 1 ramdisk 259 blkext 7 loop 8 sd 9 md 11 sr 65 sd 66 sd 67 sd 68 sd 69 sd 70 sd 71 sd 128 sd 129 sd 130 sd 131 sd 132 sd 133 sd 134 sd 135 sd 179 mmc 253 device-mapper 254 mdp -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/22 9:23, Alasdair G Kergon wrote: On 2014/7/9 6:39, Mikulas Patocka wrote: Which ioctl with more than 16kB arguments do you use? Unanswered. Let's ask the same question in a different way: Please supply the output of these three commands on the real-world system on which you believe that that particular allocation is causing you a problem: dmsetup info -c dmsetup table dmsetup status Android doesn't include the command. We compiled lvm2-2_02_107.tar.gz and copy dmsetup to the board. But command reports No devices. # /data/bin/dmsetup info No devices found # cat /proc/devices Character devices: 1 mem 4 /dev/vc/0 4 tty 4 ttyS 4 ttyMFD 5 /dev/tty 5 /dev/console 5 /dev/ptmx 7 vcs 10 misc 13 input 21 sg 29 fb 81 video4linux 89 i2c 108 ppp 116 alsa 128 ptm 136 pts 153 spi 166 ttyACM 180 usb 188 ttyUSB 189 usb_device 202 cpu/msr 203 cpu/cpuid 226 drm 241 mdm_ctrl 242 sep54 243 roccat 244 hidraw 245 ttyGS 246 usbmon 247 ttyPTI 248 gsmtty 249 bsg 250 iio 251 ptp 252 pps 253 media 254 rtc Block devices: 1 ramdisk 259 blkext 7 loop 8 sd 9 md 11 sr 65 sd 66 sd 67 sd 68 sd 69 sd 70 sd 71 sd 128 sd 129 sd 130 sd 131 sd 132 sd 133 sd 134 sd 135 sd 179 mmc 253 device-mapper 254 mdp -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/22 10:04, Alasdair G Kergon wrote: On Tue, Jul 22, 2014 at 02:23:52AM +0100, Alasdair G Kergon wrote: Unanswered. Let's ask the same question in a different way: A quick search for 'vold' returns: https://android.googlesource.com/platform/system/vold/ and the code there requests a fixed 64K allocation to hold the names of active volumes. So unlike libdevmapper-based applications where a smaller allocation is used at first and only extended if needed, Android just assumes that 64KB is enough for everyone. So is your claim that: 1. This 64KB allocation for the brief duration of the ioctl to store the names of active device-mapper volumes leads to memory problems? [Mustn't the system *already* be in a bad state if this pushes it over the limit?] It's a good question. 1) Usually, Android mobile runs for a long time. It's very command that users don't turn off the phones for months. Users might start lots of applications. memory is used up in the end. Kernel might recollect memory over and over again. 2) We never blames this Out of memory issue fully to DM. 3) We want to improve the OOM issue. and 2. The systems on which this memory shortage occurs have so many volumes (with long names?) that a smaller allocation would not suffice? 64K is small, comparing with 2GB, even 4GB total memory. However, this 64K by kmalloc has a strict criteria. It might be continuous physical memory and align with 64K. If memory is used up, freed memory is very fragmented. Such 64K criteria is a hard request. Usually, driver can allocate such memory at booting initialization. After that, it should allocate many sparse pages instead of continuous memory. Here 64K allocation is after booting. Thanks for the kind comments. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/9 22:53, Mikulas Patocka wrote: On Wed, 9 Jul 2014, Zhang, Yanmin wrote: On 2014/7/9 6:39, Mikulas Patocka wrote: Hi Mikulas, Thanks for your kind comments. I don't really know what is the purpose of this patch. In existing device mapper code, if kmalloc fails, the allocation is retried with __vmalloc. So there is no need to avoid kmalloc aritifically. kmalloc doesn't cause memory fragmentation. If the memory is too fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause memory being fragmented. I agree with you. The patch's original description is not appropriate. Basically, memory fragmentation is not caused by this kmalloc. The right description is: When memory is fragmented and most memory is used up, kmalloc a big memory might cause lots of OutOFMemory and system might kill lots of processes. Then, system might hang. Sorry for replying you too late. I am very busy in some other critical issues. The question is - does this particular kmalloc in device mapper cause out of memory or killing of other tasks? It has flags __GFP_NORETRY, When memory is fragmented, drivers need allocate small pages instead of big memory. Even with __GFP_NORETRY, driver might get a big memory by luck. That means other drivers would get fewer chances to fulfill their memory requests, such like allocating 2 pages for task_struct. Later on, OOM might happen. __GFP_NOMEMALLOC, __GFP_NOWARN so it shouldn't cause any trouble. It should just fail silently if memory is fragmented. It's hard to say this call causes out of memory. There are many such places in kernel to allocate big continuous memory. One is in seq_read, where we created a patch to use vmalloc instead of kmalloc to fix it, but got far worse comments as it's very old code. Another is in our own gfx driver. We want to fix all. We can't blame the OOM to just one place. Monkey testing is popular for Android development. We run the testing frequently. It might start lots of applications. Eventually, it is a comprehensive testing. Do you have some stacktrace that identifies this kmalloc as a problem? Sometimes, when OOM happens, kernel log shows some backtrace of big continuous memory allocation failure. Sometimes, when board can't respond and watchdog might reset the board after saving thread callchain into disk. Do this test - prepare two kernels that are identical, except that one kernel has that one-line change in dm-ioctl. Boot each kernel 10 times, do exactly the same operation after boot. Does the kernel with the patch always behave correctly and does the kernel without the patch always fail? No. Instead of just one, many places have impact on the OOM issue. Report the result - how many failures did you get with or without that one-line patch. Without such a test - I just don't believe that your patch makes any difference. Another question - your patch only makes change if some device mapper ioctl has more than 16kB arugments. Which ioctl with more than 16kB arguments do you use? Do you load such a big table to device mapper? How often do you call that ioctl with such big arguments? Xinhui's email mentions the ioctl details. In android, there is a command "dumpstate", it run many other commands to collect information. In our scenario, it run command "vdc dump", and vdc uses socket to pass some parameters to "vold", then vold generates ioctl. Thanks for your patience. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/9 22:53, Mikulas Patocka wrote: On Wed, 9 Jul 2014, Zhang, Yanmin wrote: On 2014/7/9 6:39, Mikulas Patocka wrote: Hi Mikulas, Thanks for your kind comments. I don't really know what is the purpose of this patch. In existing device mapper code, if kmalloc fails, the allocation is retried with __vmalloc. So there is no need to avoid kmalloc aritifically. kmalloc doesn't cause memory fragmentation. If the memory is too fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause memory being fragmented. I agree with you. The patch's original description is not appropriate. Basically, memory fragmentation is not caused by this kmalloc. The right description is: When memory is fragmented and most memory is used up, kmalloc a big memory might cause lots of OutOFMemory and system might kill lots of processes. Then, system might hang. Sorry for replying you too late. I am very busy in some other critical issues. The question is - does this particular kmalloc in device mapper cause out of memory or killing of other tasks? It has flags __GFP_NORETRY, When memory is fragmented, drivers need allocate small pages instead of big memory. Even with __GFP_NORETRY, driver might get a big memory by luck. That means other drivers would get fewer chances to fulfill their memory requests, such like allocating 2 pages for task_struct. Later on, OOM might happen. __GFP_NOMEMALLOC, __GFP_NOWARN so it shouldn't cause any trouble. It should just fail silently if memory is fragmented. It's hard to say this call causes out of memory. There are many such places in kernel to allocate big continuous memory. One is in seq_read, where we created a patch to use vmalloc instead of kmalloc to fix it, but got far worse comments as it's very old code. Another is in our own gfx driver. We want to fix all. We can't blame the OOM to just one place. Monkey testing is popular for Android development. We run the testing frequently. It might start lots of applications. Eventually, it is a comprehensive testing. Do you have some stacktrace that identifies this kmalloc as a problem? Sometimes, when OOM happens, kernel log shows some backtrace of big continuous memory allocation failure. Sometimes, when board can't respond and watchdog might reset the board after saving thread callchain into disk. Do this test - prepare two kernels that are identical, except that one kernel has that one-line change in dm-ioctl. Boot each kernel 10 times, do exactly the same operation after boot. Does the kernel with the patch always behave correctly and does the kernel without the patch always fail? No. Instead of just one, many places have impact on the OOM issue. Report the result - how many failures did you get with or without that one-line patch. Without such a test - I just don't believe that your patch makes any difference. Another question - your patch only makes change if some device mapper ioctl has more than 16kB arugments. Which ioctl with more than 16kB arguments do you use? Do you load such a big table to device mapper? How often do you call that ioctl with such big arguments? Xinhui's email mentions the ioctl details. In android, there is a command dumpstate, it run many other commands to collect information. In our scenario, it run command vdc dump, and vdc uses socket to pass some parameters to vold, then vold generates ioctl. Thanks for your patience. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: Don't enable the perf_event without in PERF_ATTACH_CONTEXT status
On 2014/7/14 21:27, Peter Zijlstra wrote: On Thu, Jul 03, 2014 at 11:36:38AM +0800, Chen LinX wrote: From: "Chen LinX" when do cpu hotplug test and run below perf test together, pmu may access freed perf_event while true; do perf record -a -g -f sleep 10 rm perf.* done the scenario is that when cpu offline firstly, the 'perf_cpu_notify' will disable event on the pmu and remove it from the context list. after cpu online, the perf app may enable the event But it does not, right? Thanks for your kind comments. It does, actually. The major reason is application calls many syscall to start perf. 1) perf_event_open => perf_install_in_context; 2) perf_ioctl => perf_event_enable. After step 1), the cpu might be hot unplugged, and the event is removed from the cpu context. Then, the cpu is hog plugged back. The app runs at step 2) to enable the event on that cpu. Then, the cpu is hot unplugged again. As the event is not linked to the cpu's context, perf_cpu_notify can't disable the events. Mostly, the event is linked in cpuc->event_list[XXX]. At that time, applcation might free the event. When the cpu is plugged back, it might use the freed event and cause kernel panic. that without linked in context list again. when cpu offine the second time, the 'perf_cpu_notify' can't disable event on the pmu as the event doesn't link to context list. the perf app may free this event later(the free procedure try to disable event on the pmu but as the cpu is offline, the 'cpu_function_call(event->cpu, __perf_remove_from_context, event)' is failed) Failed how, below is __perf_install_in_context. . then after cpu online again, pmu will access freed perf_event and hit panic. so adding PERF_ATTACH_CONTEXT flag check before enable event to avoid this scenario. In fact it does not. If you look at perf_event_enable() there's a code path that doesn't call __perf_event_enable(). Here we hit it with a per-cpu event instead of task event. [ 157.670138 ] [] __perf_install_in_context+0xff/0x170 And yet, __perf_install_in_context isn't mentioned at all in the above. Change-Id: I7265d83159b9180e9be3a370ba50e067385547bd Signed-off-by: Yanmin Zhang Signed-off-by: Chen LinX Wrong SoB-chain, Yanmin didn't author this patch did he, seeing how From is you. And Yanmin didn't actually send me this patch either. Lin works with me in the same team. He is smart, but new in kernel upstream community. I debugged with him and he caught the root cause ahead of me. The patch is good. Lin is running more testing with the latest kernel 3.16.0-rc5 now. Thanks, Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: Don't enable the perf_event without in PERF_ATTACH_CONTEXT status
On 2014/7/14 21:27, Peter Zijlstra wrote: On Thu, Jul 03, 2014 at 11:36:38AM +0800, Chen LinX wrote: From: Chen LinX linx.z.c...@intel.com when do cpu hotplug test and run below perf test together, pmu may access freed perf_event while true; do perf record -a -g -f sleep 10 rm perf.* done the scenario is that when cpu offline firstly, the 'perf_cpu_notify' will disable event on the pmu and remove it from the context list. after cpu online, the perf app may enable the event But it does not, right? Thanks for your kind comments. It does, actually. The major reason is application calls many syscall to start perf. 1) perf_event_open = perf_install_in_context; 2) perf_ioctl = perf_event_enable. After step 1), the cpu might be hot unplugged, and the event is removed from the cpu context. Then, the cpu is hog plugged back. The app runs at step 2) to enable the event on that cpu. Then, the cpu is hot unplugged again. As the event is not linked to the cpu's context, perf_cpu_notify can't disable the events. Mostly, the event is linked in cpuc-event_list[XXX]. At that time, applcation might free the event. When the cpu is plugged back, it might use the freed event and cause kernel panic. that without linked in context list again. when cpu offine the second time, the 'perf_cpu_notify' can't disable event on the pmu as the event doesn't link to context list. the perf app may free this event later(the free procedure try to disable event on the pmu but as the cpu is offline, the 'cpu_function_call(event-cpu, __perf_remove_from_context, event)' is failed) Failed how, below is __perf_install_in_context. . then after cpu online again, pmu will access freed perf_event and hit panic. so adding PERF_ATTACH_CONTEXT flag check before enable event to avoid this scenario. In fact it does not. If you look at perf_event_enable() there's a code path that doesn't call __perf_event_enable(). Here we hit it with a per-cpu event instead of task event. [ 157.670138 ] [8216321f] __perf_install_in_context+0xff/0x170 And yet, __perf_install_in_context isn't mentioned at all in the above. Change-Id: I7265d83159b9180e9be3a370ba50e067385547bd Signed-off-by: Yanmin Zhang yanmin.zh...@intel.com Signed-off-by: Chen LinX linx.z.c...@intel.com Wrong SoB-chain, Yanmin didn't author this patch did he, seeing how From is you. And Yanmin didn't actually send me this patch either. Lin works with me in the same team. He is smart, but new in kernel upstream community. I debugged with him and he caught the root cause ahead of me. The patch is good. Lin is running more testing with the latest kernel 3.16.0-rc5 now. Thanks, Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/9 6:39, Mikulas Patocka wrote: Hi Mikulas, Thanks for your kind comments. I don't really know what is the purpose of this patch. In existing device mapper code, if kmalloc fails, the allocation is retried with __vmalloc. So there is no need to avoid kmalloc aritifically. kmalloc doesn't cause memory fragmentation. If the memory is too fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause memory being fragmented. I agree with you. The patch's original description is not appropriate. Basically, memory fragmentation is not caused by this kmalloc. The right description is: When memory is fragmented and most memory is used up, kmalloc a big memory might cause lots of OutOFMemory and system might kill lots of processes. Then, system might hang. We are enabling Android on mobiles and tablets. We hit OOM often at Monkey and other stress testing. Sometimes, kernel prints big memory allocation warning. Theoretically, it's a good idea to call kmalloc firstly. If it fails, use vmalloc. However, kernel assumes drivers do the right thing. When drivers call kmalloc to allocate a big memory, memory management would do the best to fulfill it. When memory is fragmented and most memory is used up, such allocation would cause big memory recollection. Some processes might be killed. The worse scenario is after a process is killed, it is restarted and allocates memory again. That causes system hang, or mobile users feel a big stall. We did fix a similar issue in one of our drivers. Usually, kernel and drivers can allocate big continuous memory at booting or initialization stage. After that, they need allocate small order memory. The best is to just allocate order-0 memory. If you have some other driver that fails because of large kmalloc failure, you should fix it to use scatter-gather DMA or (if the hardware can't do it) preallocate the memory when the driver is loaded. I agree with you. Here the patch fixes the issue, where dm might allocate big continuous memory after booting. Monkey testing triggers it by operating menu Setting=>Security=>InstallfromSDcard. We are catching all places in kernel where big memory allocation happens. This patch is just to fix one case. Thanks, Yanmin Mikulas On Fri, 4 Jul 2014, xinhui.pan wrote: KMALLOC_MAX_SIZE is too big, it easily cause memory fragmented and low memory when param_kernel->data_size is also big. That's not nice. So use vmalloc instead of kmalloc when size is larger than (PAGE_SIZE << 2). There are other drivers using kmalloc to gain a big size memory. And that cause our devices to hit hang and many worse issues. We don't benefit much when using kmalloc in such scenario. Signed-off-by: xinhui.pan Signed-off-by: yanmin.zhang --- drivers/md/dm-ioctl.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 5152142..31c3af9 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1709,7 +1709,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern * Use kmalloc() rather than vmalloc() when we can. */ dmi = NULL; - if (param_kernel->data_size <= KMALLOC_MAX_SIZE) { + if (param_kernel->data_size <= (PAGE_SIZE << 2)) { dmi = kmalloc(param_kernel->data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); if (dmi) *param_flags |= DM_PARAMS_KMALLOC; -- 1.7.9.5 -- dm-devel mailing list dm-de...@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params
On 2014/7/9 6:39, Mikulas Patocka wrote: Hi Mikulas, Thanks for your kind comments. I don't really know what is the purpose of this patch. In existing device mapper code, if kmalloc fails, the allocation is retried with __vmalloc. So there is no need to avoid kmalloc aritifically. kmalloc doesn't cause memory fragmentation. If the memory is too fragmented, kmalloc fails. If it isn't, it succeeds. But it doesn't cause memory being fragmented. I agree with you. The patch's original description is not appropriate. Basically, memory fragmentation is not caused by this kmalloc. The right description is: When memory is fragmented and most memory is used up, kmalloc a big memory might cause lots of OutOFMemory and system might kill lots of processes. Then, system might hang. We are enabling Android on mobiles and tablets. We hit OOM often at Monkey and other stress testing. Sometimes, kernel prints big memory allocation warning. Theoretically, it's a good idea to call kmalloc firstly. If it fails, use vmalloc. However, kernel assumes drivers do the right thing. When drivers call kmalloc to allocate a big memory, memory management would do the best to fulfill it. When memory is fragmented and most memory is used up, such allocation would cause big memory recollection. Some processes might be killed. The worse scenario is after a process is killed, it is restarted and allocates memory again. That causes system hang, or mobile users feel a big stall. We did fix a similar issue in one of our drivers. Usually, kernel and drivers can allocate big continuous memory at booting or initialization stage. After that, they need allocate small order memory. The best is to just allocate order-0 memory. If you have some other driver that fails because of large kmalloc failure, you should fix it to use scatter-gather DMA or (if the hardware can't do it) preallocate the memory when the driver is loaded. I agree with you. Here the patch fixes the issue, where dm might allocate big continuous memory after booting. Monkey testing triggers it by operating menu Setting=Security=InstallfromSDcard. We are catching all places in kernel where big memory allocation happens. This patch is just to fix one case. Thanks, Yanmin Mikulas On Fri, 4 Jul 2014, xinhui.pan wrote: KMALLOC_MAX_SIZE is too big, it easily cause memory fragmented and low memory when param_kernel-data_size is also big. That's not nice. So use vmalloc instead of kmalloc when size is larger than (PAGE_SIZE 2). There are other drivers using kmalloc to gain a big size memory. And that cause our devices to hit hang and many worse issues. We don't benefit much when using kmalloc in such scenario. Signed-off-by: xinhui.pan xinhuix@intel.com Signed-off-by: yanmin.zhang yanmin_zh...@linux.intel.com --- drivers/md/dm-ioctl.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 5152142..31c3af9 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1709,7 +1709,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern * Use kmalloc() rather than vmalloc() when we can. */ dmi = NULL; - if (param_kernel-data_size = KMALLOC_MAX_SIZE) { + if (param_kernel-data_size = (PAGE_SIZE 2)) { dmi = kmalloc(param_kernel-data_size, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); if (dmi) *param_flags |= DM_PARAMS_KMALLOC; -- 1.7.9.5 -- dm-devel mailing list dm-de...@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Should Pstore(ramoops) records customized information?
On 2014/6/27 20:06, Liu hua wrote: 于 2014/6/26 8:57, Zhang, Yanmin 写道: On 2014/6/25 21:08, Liu hua wrote: 于 2014/6/25 8:41, Zhang, Yanmin 写道: On 2014/6/20 18:47, Liu hua wrote: On 2014/6/20 7:42, Luck, Tony wrote: BTW, I note that "extern struct pstore_info *psinfo" locates in fs/pstore/internal.h. So users out of directory "fs/pstore/" can not use pstore to record messages. We do not want other kernel users to use pstore, right? And can we break this? Yes we can make some interface visible to the rest of the kernel ... probably not the raw "*psinfo" though. Perhaps the pstore_alloc_ring_buffer() and pstore_write_ring_buffer() functions should be the ones exported to the rest of the kernel. ditoo.. Since other backends like efi and erst may can not privide such ring buffer. So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct. Yes - that allows less capable backend like ERST and efivars to not provide the service. Since it becomes internal, you can drop the "pstore_" prefix. E.g. something like: int pstore_alloc_ring_buffer(char *name, int size) { return psinfo->alloc_ring_buffer(name, size); } EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer); ... and you have to find/make some global header for the "extern" declaration. I will make these RFC patch series according to our discussion. Thanks you very to valuable advice. Sorry for seeing your email late.We already worked out some patches to restructure pstore. Would you like to try patchset http://article.gmane.org/gmane.linux.kernel/1697680/? We have more patches available to add some flags to disable/enable specific zones. That's great! I have tried your patches. BTW, Your patches do not work on ARM platform, before I changed linker scripts; Initially, we just implemented it on x86. It's easy to extend it to ARM. Mostly change the arm vmlinux.lds.S to add the sections. Pls. also change setup_arch to allocate memory blocks for pstore. In the patchset, there is an example patch, including reserve memory and zone examples. Pls. reference it. And can we use this method in modules(I failed to do that)? It's a good question. There are many approaches to support modules. 1) Define the zone in built-in files and export it.Then, you can use it in module. 2) Define the zone and new tracer functions in built-in files and export the tracer functions. After a quick glance and try, I think my idea is a little different from yours. I will reply you later. Pls. Share your opinions. We are improving pstore to make it easier to be used. This feature can use in real products (actually we have done that), because usually several mega-byte-size ram is enough and it is very useful for fault location. So I want that pstore can be implemented in products, not just in labs. These seems that there are at least two ways to make pstore visible to other kernel users: (1) static allocation:(your way, maybe my description is not good, please correct me) When kernel image is made, zones are determinate; So if moudules want to use a zone, we should define it in kernel source before compiling and export it; (a)advantage: This method will not fail at most time if it passed at first time. (b)disadvantage: Engineer should change the kernel source code if he want to get a zone to record something. For lab, it is good enough; but for products, different products may use different kernel source codes if they want record different messages. It is very expensive. (2) dynamic alloction: (ring buffer similar to your zone) (1) We should introduce metadata to describe the ring buffers in the ramoops bankend. So when initializing, we just need to read metadata. then we know information of all ring buffers. So we can read and manage all ring buffers in a list named "ring_buffer_list"; (2) when we call pstore_alloc_ring_buffer(name,size). If "ring_buffer_list" contains this ring buffer, we do nothing if size check passes. Else we create a ring buffer description and add it to list; (3) When we call pstore_write_ring_buffer(name,str). we find this ring buffer in "ring_buffer_list" by name; and then copy strings to this ring buffer. disadvantage: (1) So this alloc maybe fail when name is the same, but size isn't; (2) We should find a way to format the backend. erase is not enough. (earse just clear the data of a ring buffer, but the ring buffer still exists). advantage: (1) Os venders can implement this feature. Then people can use it even they can not compile the kernel. (2) We can determin how to use the ramoops backend at runtime, rather than before compiling t
Re: Should Pstore(ramoops) records customized information?
On 2014/6/27 20:06, Liu hua wrote: 于 2014/6/26 8:57, Zhang, Yanmin 写道: On 2014/6/25 21:08, Liu hua wrote: 于 2014/6/25 8:41, Zhang, Yanmin 写道: On 2014/6/20 18:47, Liu hua wrote: On 2014/6/20 7:42, Luck, Tony wrote: BTW, I note that extern struct pstore_info *psinfo locates in fs/pstore/internal.h. So users out of directory fs/pstore/ can not use pstore to record messages. We do not want other kernel users to use pstore, right? And can we break this? Yes we can make some interface visible to the rest of the kernel ... probably not the raw *psinfo though. Perhaps the pstore_alloc_ring_buffer() and pstore_write_ring_buffer() functions should be the ones exported to the rest of the kernel. ditoo.. Since other backends like efi and erst may can not privide such ring buffer. So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct. Yes - that allows less capable backend like ERST and efivars to not provide the service. Since it becomes internal, you can drop the pstore_ prefix. E.g. something like: int pstore_alloc_ring_buffer(char *name, int size) { return psinfo-alloc_ring_buffer(name, size); } EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer); ... and you have to find/make some global header for the extern declaration. I will make these RFC patch series according to our discussion. Thanks you very to valuable advice. Sorry for seeing your email late.We already worked out some patches to restructure pstore. Would you like to try patchset http://article.gmane.org/gmane.linux.kernel/1697680/? We have more patches available to add some flags to disable/enable specific zones. That's great! I have tried your patches. BTW, Your patches do not work on ARM platform, before I changed linker scripts; Initially, we just implemented it on x86. It's easy to extend it to ARM. Mostly change the arm vmlinux.lds.S to add the sections. Pls. also change setup_arch to allocate memory blocks for pstore. In the patchset, there is an example patch, including reserve memory and zone examples. Pls. reference it. And can we use this method in modules(I failed to do that)? It's a good question. There are many approaches to support modules. 1) Define the zone in built-in files and export it.Then, you can use it in module. 2) Define the zone and new tracer functions in built-in files and export the tracer functions. After a quick glance and try, I think my idea is a little different from yours. I will reply you later. Pls. Share your opinions. We are improving pstore to make it easier to be used. This feature can use in real products (actually we have done that), because usually several mega-byte-size ram is enough and it is very useful for fault location. So I want that pstore can be implemented in products, not just in labs. These seems that there are at least two ways to make pstore visible to other kernel users: (1) static allocation:(your way, maybe my description is not good, please correct me) When kernel image is made, zones are determinate; So if moudules want to use a zone, we should define it in kernel source before compiling and export it; (a)advantage: This method will not fail at most time if it passed at first time. (b)disadvantage: Engineer should change the kernel source code if he want to get a zone to record something. For lab, it is good enough; but for products, different products may use different kernel source codes if they want record different messages. It is very expensive. (2) dynamic alloction: (ring buffer similar to your zone) (1) We should introduce metadata to describe the ring buffers in the ramoops bankend. So when initializing, we just need to read metadata. then we know information of all ring buffers. So we can read and manage all ring buffers in a list named ring_buffer_list; (2) when we call pstore_alloc_ring_buffer(name,size). If ring_buffer_list contains this ring buffer, we do nothing if size check passes. Else we create a ring buffer description and add it to list; (3) When we call pstore_write_ring_buffer(name,str). we find this ring buffer in ring_buffer_list by name; and then copy strings to this ring buffer. disadvantage: (1) So this alloc maybe fail when name is the same, but size isn't; (2) We should find a way to format the backend. erase is not enough. (earse just clear the data of a ring buffer, but the ring buffer still exists). advantage: (1) Os venders can implement this feature. Then people can use it even they can not compile the kernel. (2) We can determin how to use the ramoops backend at runtime, rather than before compiling the kernel. So which way do we really need? And what do you think, Tony or others ? I got your
Re: Should Pstore(ramoops) records customized information?
On 2014/6/25 21:08, Liu hua wrote: 于 2014/6/25 8:41, Zhang, Yanmin 写道: On 2014/6/20 18:47, Liu hua wrote: On 2014/6/20 7:42, Luck, Tony wrote: BTW, I note that "extern struct pstore_info *psinfo" locates in fs/pstore/internal.h. So users out of directory "fs/pstore/" can not use pstore to record messages. We do not want other kernel users to use pstore, right? And can we break this? Yes we can make some interface visible to the rest of the kernel ... probably not the raw "*psinfo" though. Perhaps the pstore_alloc_ring_buffer() and pstore_write_ring_buffer() functions should be the ones exported to the rest of the kernel. ditoo.. Since other backends like efi and erst may can not privide such ring buffer. So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct. Yes - that allows less capable backend like ERST and efivars to not provide the service. Since it becomes internal, you can drop the "pstore_" prefix. E.g. something like: int pstore_alloc_ring_buffer(char *name, int size) { return psinfo->alloc_ring_buffer(name, size); } EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer); ... and you have to find/make some global header for the "extern" declaration. I will make these RFC patch series according to our discussion. Thanks you very to valuable advice. Sorry for seeing your email late.We already worked out some patches to restructure pstore. Would you like to try patchset http://article.gmane.org/gmane.linux.kernel/1697680/? We have more patches available to add some flags to disable/enable specific zones. That's great! I have tried your patches. BTW, Your patches do not work on ARM platform, before I changed linker scripts; Initially, we just implemented it on x86. It's easy to extend it to ARM. Mostly change the arm vmlinux.lds.S to add the sections. Pls. also change setup_arch to allocate memory blocks for pstore. In the patchset, there is an example patch, including reserve memory and zone examples. Pls. reference it. And can we use this method in modules(I failed to do that)? It's a good question. There are many approaches to support modules. 1) Define the zone in built-in files and export it.Then, you can use it in module. 2) Define the zone and new tracer functions in built-in files and export the tracer functions. After a quick glance and try, I think my idea is a little different from yours. I will reply you later. Pls. Share your opinions. We are improving pstore to make it easier to be used. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Should Pstore(ramoops) records customized information?
On 2014/6/25 21:08, Liu hua wrote: 于 2014/6/25 8:41, Zhang, Yanmin 写道: On 2014/6/20 18:47, Liu hua wrote: On 2014/6/20 7:42, Luck, Tony wrote: BTW, I note that extern struct pstore_info *psinfo locates in fs/pstore/internal.h. So users out of directory fs/pstore/ can not use pstore to record messages. We do not want other kernel users to use pstore, right? And can we break this? Yes we can make some interface visible to the rest of the kernel ... probably not the raw *psinfo though. Perhaps the pstore_alloc_ring_buffer() and pstore_write_ring_buffer() functions should be the ones exported to the rest of the kernel. ditoo.. Since other backends like efi and erst may can not privide such ring buffer. So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct. Yes - that allows less capable backend like ERST and efivars to not provide the service. Since it becomes internal, you can drop the pstore_ prefix. E.g. something like: int pstore_alloc_ring_buffer(char *name, int size) { return psinfo-alloc_ring_buffer(name, size); } EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer); ... and you have to find/make some global header for the extern declaration. I will make these RFC patch series according to our discussion. Thanks you very to valuable advice. Sorry for seeing your email late.We already worked out some patches to restructure pstore. Would you like to try patchset http://article.gmane.org/gmane.linux.kernel/1697680/? We have more patches available to add some flags to disable/enable specific zones. That's great! I have tried your patches. BTW, Your patches do not work on ARM platform, before I changed linker scripts; Initially, we just implemented it on x86. It's easy to extend it to ARM. Mostly change the arm vmlinux.lds.S to add the sections. Pls. also change setup_arch to allocate memory blocks for pstore. In the patchset, there is an example patch, including reserve memory and zone examples. Pls. reference it. And can we use this method in modules(I failed to do that)? It's a good question. There are many approaches to support modules. 1) Define the zone in built-in files and export it.Then, you can use it in module. 2) Define the zone and new tracer functions in built-in files and export the tracer functions. After a quick glance and try, I think my idea is a little different from yours. I will reply you later. Pls. Share your opinions. We are improving pstore to make it easier to be used. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Should Pstore(ramoops) records customized information?
On 2014/6/20 18:47, Liu hua wrote: On 2014/6/20 7:42, Luck, Tony wrote: BTW, I note that "extern struct pstore_info *psinfo" locates in fs/pstore/internal.h. So users out of directory "fs/pstore/" can not use pstore to record messages. We do not want other kernel users to use pstore, right? And can we break this? Yes we can make some interface visible to the rest of the kernel ... probably not the raw "*psinfo" though. Perhaps the pstore_alloc_ring_buffer() and pstore_write_ring_buffer() functions should be the ones exported to the rest of the kernel. ditoo.. Since other backends like efi and erst may can not privide such ring buffer. So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct. Yes - that allows less capable backend like ERST and efivars to not provide the service. Since it becomes internal, you can drop the "pstore_" prefix. E.g. something like: int pstore_alloc_ring_buffer(char *name, int size) { return psinfo->alloc_ring_buffer(name, size); } EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer); ... and you have to find/make some global header for the "extern" declaration. I will make these RFC patch series according to our discussion. Thanks you very to valuable advice. Sorry for seeing your email late.We already worked out some patches to restructure pstore. Would you like to try patchset http://article.gmane.org/gmane.linux.kernel/1697680/? We have more patches available to add some flags to disable/enable specific zones. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Should Pstore(ramoops) records customized information?
On 2014/6/20 18:47, Liu hua wrote: On 2014/6/20 7:42, Luck, Tony wrote: BTW, I note that extern struct pstore_info *psinfo locates in fs/pstore/internal.h. So users out of directory fs/pstore/ can not use pstore to record messages. We do not want other kernel users to use pstore, right? And can we break this? Yes we can make some interface visible to the rest of the kernel ... probably not the raw *psinfo though. Perhaps the pstore_alloc_ring_buffer() and pstore_write_ring_buffer() functions should be the ones exported to the rest of the kernel. ditoo.. Since other backends like efi and erst may can not privide such ring buffer. So pstore_alloc_ring_buffer should be a funciton pointer of pstore_info struct. Yes - that allows less capable backend like ERST and efivars to not provide the service. Since it becomes internal, you can drop the pstore_ prefix. E.g. something like: int pstore_alloc_ring_buffer(char *name, int size) { return psinfo-alloc_ring_buffer(name, size); } EXPORT_SYMBOL_GPL(pstore_alloc_ring_buffer); ... and you have to find/make some global header for the extern declaration. I will make these RFC patch series according to our discussion. Thanks you very to valuable advice. Sorry for seeing your email late.We already worked out some patches to restructure pstore. Would you like to try patchset http://article.gmane.org/gmane.linux.kernel/1697680/? We have more patches available to add some flags to disable/enable specific zones. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
On 2014/6/5 17:15, Peter Zijlstra wrote: On Thu, Jun 05, 2014 at 04:00:24PM +0800, Zhang, Yanmin wrote: On 2014/6/5 15:55, Peter Zijlstra wrote: Why does pstore cause corruption? I thought that stuff was supposed to be 'good' ? pstore is good if the board is reset by WarmReset as memory content is kept across rebooting. If it's a ColdReset, memory might lose some or all contents. My board uses Coldreset, which is a very fast ColdReset. Most memory content can be kept. But sometimes, some data has little change. Oh, I thought pstore was persistent across cold resets or even power outages. My bad then. Sorry for misleading you. It depends on backend. Pstore can use RAM or other storage, for example, EFI-provided backend. If using EFI backend, pstore can keep persistent across cold reset or even power outrages. If using RAM, usually pstore can keep persistent at WarmReset. Since RAM access (even at non-cache mode) is fast and more flexible, in fact, we often use RAM as backend of pstore. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
On 2014/6/5 15:55, Peter Zijlstra wrote: On Thu, Jun 05, 2014 at 03:33:21PM +0800, Liu ShuoX wrote: On Thu 5.Jun'14 at 9:19:19 +0200, Peter Zijlstra wrote: On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote: From: Zhang Yanmin We hit a kernel panic when running perf to collect some performance data. kenel is x86_64 and user space apps are 32bit. [ 71.965351, 1] [ Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0004 [ 71.965360, 1] [ Binder_2] IP: [] get_segment_base+0x71/0xc0 [ 71.965367, 1] [ Binder_2] PGD 6c65f067 PUD 0 [ 71.965375, 1] [ Binder_2] Oops: [#1] PREEMPT SMP [ 71.965413, 1] [ Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211 [ 71.965420, 1] [ Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14 [ 71.965426, 1] [ Binder_2] task: 8800764dc300 ti: 88006c6e8000 task.ti: 88006c6e8000 [ 71.965439, 1] [ Binder_2] RIP: 0010:[] [] get_segment_base+0x71/0xc0 ^ [ 71.965<44, 1] [ Binder_2] RSP: 0018:^X8007ea87b98 EFLAGS: 00010092 ^ ^ [ 71.965447, 1] [ !Binder_2] RAX: 0024 RBX: RCX: ^ [ 71.965450, 1] [ Binder_2] RDX: RSI: RDI: 0009 [ 71.965454, 1] [ Binder_2] RBP: 88007ea87ba8 R08: 83143b3c R09: 831848a8 [ 71.965458, 1] [ Binder_2] R10: R11: 001bf2d8 R12: [ 71.965462, 1] [ Binder_2] R13: 88006c6e9fd8 R14: 88006c6e9f58 R15: 8800764dc300 [ 71.965468, 1_ [ Binder_2] FS: () GS:88007ea8(006b) knlGS:f704add0 ^ Are you suffering some serious corruption? The log captured by pstore after rebooting, so there are some corruption. Please ignore those. Why does pstore cause corruption? I thought that stuff was supposed to be 'good' ? pstore is good if the board is reset by WarmReset as memory content is kept across rebooting. If it's a ColdReset, memory might lose some or all contents. My board uses Coldreset, which is a very fast ColdReset. Most memory content can be kept. But sometimes, some data has little change. Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
On 2014/6/5 15:55, Peter Zijlstra wrote: On Thu, Jun 05, 2014 at 03:33:21PM +0800, Liu ShuoX wrote: On Thu 5.Jun'14 at 9:19:19 +0200, Peter Zijlstra wrote: On Thu, Jun 05, 2014 at 10:36:10AM +0800, Liu ShuoX wrote: From: Zhang Yanmin yanmin.zh...@intel.com We hit a kernel panic when running perf to collect some performance data. kenel is x86_64 and user space apps are 32bit. [ 71.965351, 1] [ Binder_2] BUG: unable to handle kernel NULL pointer dereference at 0004 [ 71.965360, 1] [ Binder_2] IP: [82012091] get_segment_base+0x71/0xc0 [ 71.965367, 1] [ Binder_2] PGD 6c65f067 PUD 0 [ 71.965375, 1] [ Binder_2] Oops: [#1] PREEMPT SMP [ 71.965413, 1] [ Binder_2] Modules linked in: ddrgx snd_merr_dpcm_wm8958 snd_intel_sst snd_soc_sst_platform snd_soc_wm8994 snd_soc_wm_hubs lm3559 imx1x5 atomisp_css2401a0_v21 libmsrlisthelper rmi4 bcm_bt_lpm videobuf_vmalloc videobuf_core fps_throttle hdmi_audio pn544(O) tngdisp bcm4335(O) cfg80211 [ 71.965420, 1] [ Binder_2] CPU: 1 PID: 304 Comm: Binder_2 Tainted: G W O 3.10.20-263902-g184bfbc-dirty #14 [ 71.965426, 1] [ Binder_2] task: 8800764dc300 ti: 88006c6e8000 task.ti: 88006c6e8000 [ 71.965439, 1] [ Binder_2] RIP: 0010:[82012091] [ff?f82012091] get_segment_base+0x71/0xc0 ^ [ 71.96544, 1] [ Binder_2] RSP: 0018:^X8007ea87b98 EFLAGS: 00010092 ^ ^ [ 71.965447, 1] [ !Binder_2] RAX: 0024 RBX: RCX: ^ [ 71.965450, 1] [ Binder_2] RDX: RSI: RDI: 0009 [ 71.965454, 1] [ Binder_2] RBP: 88007ea87ba8 R08: 83143b3c R09: 831848a8 [ 71.965458, 1] [ Binder_2] R10: R11: 001bf2d8 R12: [ 71.965462, 1] [ Binder_2] R13: 88006c6e9fd8 R14: 88006c6e9f58 R15: 8800764dc300 [ 71.965468, 1_ [ Binder_2] FS: () GS:88007ea8(006b) knlGS:f704add0 ^ Are you suffering some serious corruption? The log captured by pstore after rebooting, so there are some corruption. Please ignore those. Why does pstore cause corruption? I thought that stuff was supposed to be 'good' ? pstore is good if the board is reset by WarmReset as memory content is kept across rebooting. If it's a ColdReset, memory might lose some or all contents. My board uses Coldreset, which is a very fast ColdReset. Most memory content can be kept. But sometimes, some data has little change. Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf: fix kernel panic when parsing user space CS saved in pt_regs
On 2014/6/5 17:15, Peter Zijlstra wrote: On Thu, Jun 05, 2014 at 04:00:24PM +0800, Zhang, Yanmin wrote: On 2014/6/5 15:55, Peter Zijlstra wrote: Why does pstore cause corruption? I thought that stuff was supposed to be 'good' ? pstore is good if the board is reset by WarmReset as memory content is kept across rebooting. If it's a ColdReset, memory might lose some or all contents. My board uses Coldreset, which is a very fast ColdReset. Most memory content can be kept. But sometimes, some data has little change. Oh, I thought pstore was persistent across cold resets or even power outages. My bad then. Sorry for misleading you. It depends on backend. Pstore can use RAM or other storage, for example, EFI-provided backend. If using EFI backend, pstore can keep persistent across cold reset or even power outrages. If using RAM, usually pstore can keep persistent at WarmReset. Since RAM access (even at non-cache mode) is fast and more flexible, in fact, we often use RAM as backend of pstore. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: linux-next: build failure after merge of the ia64 tree
Sorry. We would fix it ASAP. Yanmin -Original Message- From: Stephen Rothwell [mailto:s...@canb.auug.org.au] Sent: Monday, March 31, 2014 8:08 AM To: Luck, Tony Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org; Liu, ShuoX; Zhang, Yanmin Subject: linux-next: build failure after merge of the ia64 tree Hi all, After merging the ia64 tree, today's linux-next build (x86_64 allmodconfig) failed like this: fs/built-in.o: In function `pstore_file_open': inode.c:(.text+0xa29e9): undefined reference to `persistent_ram_size' Caused by commit f5d7e81eda59 ("pstore: Support current records dump in ramoops"). I have used the ia64 tree from next-20140328 for today. -- Cheers, Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: linux-next: build failure after merge of the ia64 tree
Sorry. We would fix it ASAP. Yanmin -Original Message- From: Stephen Rothwell [mailto:s...@canb.auug.org.au] Sent: Monday, March 31, 2014 8:08 AM To: Luck, Tony Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org; Liu, ShuoX; Zhang, Yanmin Subject: linux-next: build failure after merge of the ia64 tree Hi all, After merging the ia64 tree, today's linux-next build (x86_64 allmodconfig) failed like this: fs/built-in.o: In function `pstore_file_open': inode.c:(.text+0xa29e9): undefined reference to `persistent_ram_size' Caused by commit f5d7e81eda59 (pstore: Support current records dump in ramoops). I have used the ia64 tree from next-20140328 for today. -- Cheers, Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pstore: skip zero size persistent ram buffer in traverse
On 2014/2/28 14:38, shuox@intel.com wrote: From: Liu ShuoX In ramoops_pstore_read, a valid prz pointer with zero size buffer will break traverse of all persistent ram buffers. The latter buffer might be lost. Andrew, Would you like to merge it to your testing tree? pstore is a very important feature for debugging hard issues on Android mobiles. Yanmin Signed-off-by: Liu ShuoX --- fs/pstore/ram.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index a5d0cab..929ea55 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -119,12 +119,12 @@ ramoops_get_next_prz(struct persistent_ram_zone *przs[], uint *c, uint max, prz = przs[i]; - if (update) { - /* Update old/shadowed buffer. */ + /* Update old/shadowed buffer. */ + if (update) persistent_ram_save_old(prz); - if (!persistent_ram_old_size(prz)) - return NULL; - } + + if (!persistent_ram_old_size(prz)) + return NULL; *typep = type; *id = i; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pstore: reset ftrace_read_cnt at ramoops_pstore_open
On 2014/2/28 14:37, shuox@intel.com wrote: From: Liu ShuoX ftrace_read_cnt need to be reset in open to support mutli times getting the records. Andrew, Would you like to merge it to your testing tree? pstore is a very important feature for debugging hard issues on Android mobiles. Yanmin Signed-off-by: Liu ShuoX --- fs/pstore/ram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index fa8cef2..a5d0cab 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -101,6 +101,7 @@ static int ramoops_pstore_open(struct pstore_info *psi) cxt->dump_read_cnt = 0; cxt->console_read_cnt = 0; + cxt->ftrace_read_cnt = 0; return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pstore: reset ftrace_read_cnt at ramoops_pstore_open
On 2014/2/28 14:37, shuox@intel.com wrote: From: Liu ShuoX shuox@intel.com ftrace_read_cnt need to be reset in open to support mutli times getting the records. Andrew, Would you like to merge it to your testing tree? pstore is a very important feature for debugging hard issues on Android mobiles. Yanmin Signed-off-by: Liu ShuoX shuox@intel.com --- fs/pstore/ram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index fa8cef2..a5d0cab 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -101,6 +101,7 @@ static int ramoops_pstore_open(struct pstore_info *psi) cxt-dump_read_cnt = 0; cxt-console_read_cnt = 0; + cxt-ftrace_read_cnt = 0; return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pstore: skip zero size persistent ram buffer in traverse
On 2014/2/28 14:38, shuox@intel.com wrote: From: Liu ShuoX shuox@intel.com In ramoops_pstore_read, a valid prz pointer with zero size buffer will break traverse of all persistent ram buffers. The latter buffer might be lost. Andrew, Would you like to merge it to your testing tree? pstore is a very important feature for debugging hard issues on Android mobiles. Yanmin Signed-off-by: Liu ShuoX shuox@intel.com --- fs/pstore/ram.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index a5d0cab..929ea55 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -119,12 +119,12 @@ ramoops_get_next_prz(struct persistent_ram_zone *przs[], uint *c, uint max, prz = przs[i]; - if (update) { - /* Update old/shadowed buffer. */ + /* Update old/shadowed buffer. */ + if (update) persistent_ram_save_old(prz); - if (!persistent_ram_old_size(prz)) - return NULL; - } + + if (!persistent_ram_old_size(prz)) + return NULL; *typep = type; *id = i; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] klist: del waiter from klist_remove_waiters before wakeup waitting process
Andrew, Would you like to merge the patch to your MM tree? Yanmin >-Original Message- >From: Peter Zijlstra [mailto:pet...@infradead.org] >Sent: Friday, May 03, 2013 6:42 PM >To: Wang, Biao >Cc: a...@linux-foundation.org; linux-kernel@vger.kernel.org; >mi...@redhat.com; Zhang, Yanmin; moc...@digitalimplant.org >Subject: Re: [PATCH] klist: del waiter from klist_remove_waiters before wakeup >waitting process > >On Fri, May 03, 2013 at 03:06:36PM +0800, wangbiao wrote: >> From: "wang, biao" >> Date: Fri, 3 May 2013 14:18:34 +0800 >> Subject: [PATCH] klist: del waiter from klist_remove_waiters before >> wakeup waitting process >> >> There is a race between klist_remove and klist_release. klist_remove >> uses a local var waiter saved on stack. When klist_release calls >> wake_up_process(waiter->process) to wake up the waiter, waiter might >> run immediately and reuse the stack. Then, klist_release calls >> list_del(>list) to change previous wait data and cause prior >> waiter thread corrupt. >> >> The patch fixes it against kernel 3.9. > > >I've never seen that code before in my life; but after a quick look you appear >to >be completely right. > >> Signed-off-by: wang, biao > >Acked-by: Peter Zijlstra > >> --- >> lib/klist.c |2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/lib/klist.c b/lib/klist.c index 0874e41..358a368 100644 >> --- a/lib/klist.c >> +++ b/lib/klist.c >> @@ -193,10 +193,10 @@ static void klist_release(struct kref *kref) >> if (waiter->node != n) >> continue; >> >> +list_del(>list); >> waiter->woken = 1; >> mb(); >> wake_up_process(waiter->process); >> -list_del(>list); >> } >> spin_unlock(_remove_lock); >> knode_set_klist(n, NULL); >> -- >> 1.7.6 >> >> >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer
>-Original Message- >From: Thomas Gleixner [mailto:t...@linutronix.de] >Sent: Friday, October 26, 2012 5:55 PM >To: He, Bo >Cc: linux-kernel@vger.kernel.org; Peter Zijlstra; Ingo Molnar; >yanmin_zh...@linux.intel.com; Zhang, Yanmin >Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer > >On Fri, 26 Oct 2012, he, bo wrote: >> From: Yanmin Zhang >> >> We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != >HRTIMER_STATE_CALLBACK). >> <2>[ 10.226053, 3] kernel BUG at >/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228 >! >> >> Basically, __run_hrtimer has a race with enqueue_hrtimer. When >> __run_hrtimer calls the timer callback fn, another thread might call >> enqueue_hrtimer or hrtimer_start to requeue it, and the timer->state >> is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which >> causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking >> fails. >> >> The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK. > >This does not fix it. It makes it worse. Thanks for your kind comments. We found that and sent V2 at https://lkml.org/lkml/2012/10/26/172. > >> Signed-off-by: Yanmin Zhang >> Reviewed-by: He, Bo >> --- >> kernel/hrtimer.c |2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c >> index 6db7a5e..6280184 100644 >> --- a/kernel/hrtimer.c >> +++ b/kernel/hrtimer.c >> @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer, >ktime_t *now) >> * hrtimer_start_range_ns() or in hrtimer_interrupt() >> */ >> if (restart != HRTIMER_NORESTART) { >> -BUG_ON(timer->state != HRTIMER_STATE_CALLBACK); >> +BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK)); >> enqueue_hrtimer(timer, base); >> } > >What you are allowing here is enqueueing an already enqueued timer >again. I don't know why this does not explode elsewhere, but that's >probably pure luck. It's not allowed to double enqueue a timer. You are right. > >So no, this is not a solution. The problem is not in the core timer >code, the problem is in the code which uses that timer. > >Your code is returning HRTIMER_RESTART from the timer callback and at >the same time it starts the timer from some other context. That's what >needs to be fixed. The timer user should fix it. But could we also change hrtimer to make it more stable? At least, instead of panic, could we print some information and go ahead to let kernel continue? Thanks, Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer
-Original Message- From: Thomas Gleixner [mailto:t...@linutronix.de] Sent: Friday, October 26, 2012 5:55 PM To: He, Bo Cc: linux-kernel@vger.kernel.org; Peter Zijlstra; Ingo Molnar; yanmin_zh...@linux.intel.com; Zhang, Yanmin Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer On Fri, 26 Oct 2012, he, bo wrote: From: Yanmin Zhang yanmin.zh...@intel.com We hit a kernel panic at __run_hrtimer=BUG_ON(timer-state != HRTIMER_STATE_CALLBACK). 2[ 10.226053, 3] kernel BUG at /home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228 ! Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer calls the timer callback fn, another thread might call enqueue_hrtimer or hrtimer_start to requeue it, and the timer-state is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which causes the BUG_ON(timer-state != HRTIMER_STATE_CALLBACK) checking fails. The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK. This does not fix it. It makes it worse. Thanks for your kind comments. We found that and sent V2 at https://lkml.org/lkml/2012/10/26/172. Signed-off-by: Yanmin Zhang yanmin.zh...@intel.com Reviewed-by: He, Bo bo...@intel.com --- kernel/hrtimer.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 6db7a5e..6280184 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now) * hrtimer_start_range_ns() or in hrtimer_interrupt() */ if (restart != HRTIMER_NORESTART) { -BUG_ON(timer-state != HRTIMER_STATE_CALLBACK); +BUG_ON(!(timer-state HRTIMER_STATE_CALLBACK)); enqueue_hrtimer(timer, base); } What you are allowing here is enqueueing an already enqueued timer again. I don't know why this does not explode elsewhere, but that's probably pure luck. It's not allowed to double enqueue a timer. You are right. So no, this is not a solution. The problem is not in the core timer code, the problem is in the code which uses that timer. Your code is returning HRTIMER_RESTART from the timer callback and at the same time it starts the timer from some other context. That's what needs to be fixed. The timer user should fix it. But could we also change hrtimer to make it more stable? At least, instead of panic, could we print some information and go ahead to let kernel continue? Thanks, Yanmin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why hold device_lock when calling callback in pci_walk_bus?
Some error handling functions call pci_walk_bus. For example, pci-e aer. Here we lock the device, so the driver wouldn't detach from the device, as the cb might call driver's callback function. -Original Message- From: Huang, Ying Sent: Friday, September 28, 2012 4:15 PM To: bhelg...@google.com Cc: Greg Kroah-Hartman; Zhang, Yanmin; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; r...@sisk.pl Subject: Why hold device_lock when calling callback in pci_walk_bus? Hi, All, If my understanding were correct, device_lock is used to provide mutual exclusion between device probe/remove/suspend/resume etc. Why hold device_lock when calling callback in pci_walk_bus. This is introduced by the following commit. commit d71374dafbba7ec3f67371d3b7e9f6310a588808 Author: Zhang Yanmin Date: Fri Jun 2 12:35:43 2006 +0800 [PATCH] PCI: fix race with pci_walk_bus and pci_destroy_dev pci_walk_bus has a race with pci_destroy_dev. When cb is called in pci_walk_bus, pci_destroy_dev might unlink the dev pointed by next. Later on in the next loop, pointer next becomes NULL and cause kernel panic. Below patch against 2.6.17-rc4 fixes it by changing pci_bus_lock (spin_lock) to pci_bus_sem (rw_semaphore). Signed-off-by: Zhang Yanmin Signed-off-by: Greg Kroah-Hartman Corresponding email thread is: https://lkml.org/lkml/2006/5/26/38 But from the commit and email thread, I can not find why we need to do that. I ask this question because I want to use pci_walk_bus in a function (in pci runtime resume path) which may be called with device_lock held. Can anyone help me on that? Best Regards, Huang Ying
RE: Why hold device_lock when calling callback in pci_walk_bus?
Some error handling functions call pci_walk_bus. For example, pci-e aer. Here we lock the device, so the driver wouldn't detach from the device, as the cb might call driver's callback function. -Original Message- From: Huang, Ying Sent: Friday, September 28, 2012 4:15 PM To: bhelg...@google.com Cc: Greg Kroah-Hartman; Zhang, Yanmin; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; r...@sisk.pl Subject: Why hold device_lock when calling callback in pci_walk_bus? Hi, All, If my understanding were correct, device_lock is used to provide mutual exclusion between device probe/remove/suspend/resume etc. Why hold device_lock when calling callback in pci_walk_bus. This is introduced by the following commit. commit d71374dafbba7ec3f67371d3b7e9f6310a588808 Author: Zhang Yanmin yanmin.zh...@intel.com Date: Fri Jun 2 12:35:43 2006 +0800 [PATCH] PCI: fix race with pci_walk_bus and pci_destroy_dev pci_walk_bus has a race with pci_destroy_dev. When cb is called in pci_walk_bus, pci_destroy_dev might unlink the dev pointed by next. Later on in the next loop, pointer next becomes NULL and cause kernel panic. Below patch against 2.6.17-rc4 fixes it by changing pci_bus_lock (spin_lock) to pci_bus_sem (rw_semaphore). Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com Signed-off-by: Greg Kroah-Hartman gre...@suse.de Corresponding email thread is: https://lkml.org/lkml/2006/5/26/38 But from the commit and email thread, I can not find why we need to do that. I ask this question because I want to use pci_walk_bus in a function (in pci runtime resume path) which may be called with device_lock held. Can anyone help me on that? Best Regards, Huang Ying
Re: tbench regression in 2.6.25-rc1
Comparing with kernel 2.6.24, tbench result has regression with 2.6.25-rc1. 1) On 2 quad-core processor stoakley: 4%. 2) On 4 quad-core processor tigerton: more than 30%. bisect located below patch. b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit commit b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b Author: Herbert Xu <[EMAIL PROTECTED]> Date: Tue Nov 13 21:33:32 2007 -0800 [IPV6]: Move nfheader_len into rt6_info The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. Above patch changes the cache line alignment, especially member __refcnt. I did a testing by adding 2 unsigned long pading before lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next cache line. The performance is recovered. I created a patch to rearrange the members in struct dst_entry. With Eric and Valdis Kletnieks's suggestion, I made finer arrangement. 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by moving tclassid to different place. It looks like tclassid could also have impact on performance. If moving tclassid before metrics, or just don't move tclassid, the performance isn't good. So I move it behind metrics. 2) Add comments before __refcnt. On 16-core tigerton: If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than the one without the patch; If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than the one without the patch. With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't introduce regression. Thank Eric, Valdis, and David! Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]> Acked-by: Eric Dumazet <[EMAIL PROTECTED]> --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 +0800 @@ -52,15 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ -#ifdef CONFIG_NET_CLS_ROUTE - __u32 tclassid; -#endif + struct dst_entry*path; struct neighbour*neighbour; struct hh_cache *hh; @@ -70,10 +65,20 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + +#ifdef CONFIG_NET_CLS_ROUTE + __u32 tclassid; +#endif + + /* +* __refcnt wants to be on a different cache line from +* input/output/ops or performance tanks badly +*/ atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tbench regression in 2.6.25-rc1
Comparing with kernel 2.6.24, tbench result has regression with 2.6.25-rc1. 1) On 2 quad-core processor stoakley: 4%. 2) On 4 quad-core processor tigerton: more than 30%. bisect located below patch. b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b is first bad commit commit b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b Author: Herbert Xu [EMAIL PROTECTED] Date: Tue Nov 13 21:33:32 2007 -0800 [IPV6]: Move nfheader_len into rt6_info The dst member nfheader_len is only used by IPv6. It's also currently creating a rather ugly alignment hole in struct dst. Therefore this patch moves it from there into struct rt6_info. Above patch changes the cache line alignment, especially member __refcnt. I did a testing by adding 2 unsigned long pading before lastuse, so the 3 members, lastuse/__refcnt/__use, are moved to next cache line. The performance is recovered. I created a patch to rearrange the members in struct dst_entry. With Eric and Valdis Kletnieks's suggestion, I made finer arrangement. 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200 no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by moving tclassid to different place. It looks like tclassid could also have impact on performance. If moving tclassid before metrics, or just don't move tclassid, the performance isn't good. So I move it behind metrics. 2) Add comments before __refcnt. On 16-core tigerton: If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than the one without the patch; If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than the one without the patch. With 32bit 2.6.25-rc1 on 8-core stoakley, the new patch doesn't introduce regression. Thank Eric, Valdis, and David! Signed-off-by: Zhang Yanmin [EMAIL PROTECTED] Acked-by: Eric Dumazet [EMAIL PROTECTED] --- --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 +0800 +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 +0800 @@ -52,15 +52,10 @@ struct dst_entry unsigned short header_len; /* more space at head required */ unsigned short trailer_len;/* space to reserve at tail */ - u32 metrics[RTAX_MAX]; - struct dst_entry*path; - - unsigned long rate_last; /* rate limiting for ICMP */ unsigned intrate_tokens; + unsigned long rate_last; /* rate limiting for ICMP */ -#ifdef CONFIG_NET_CLS_ROUTE - __u32 tclassid; -#endif + struct dst_entry*path; struct neighbour*neighbour; struct hh_cache *hh; @@ -70,10 +65,20 @@ struct dst_entry int (*output)(struct sk_buff*); struct dst_ops *ops; - - unsigned long lastuse; + + u32 metrics[RTAX_MAX]; + +#ifdef CONFIG_NET_CLS_ROUTE + __u32 tclassid; +#endif + + /* +* __refcnt wants to be on a different cache line from +* input/output/ops or performance tanks badly +*/ atomic_t__refcnt; /* client references*/ int __use; + unsigned long lastuse; union { struct dst_entry *next; struct rtable*rt_next; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tbench regression in 2.6.25-rc1
On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote: > Zhang, Yanmin a �crit : > > On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: > >> On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said: > >> > >>> I also think __refcnt is the key. I did a new testing by adding 2 > >>> unsigned long > >>> pading before lastuse, so the 3 members are moved to next cache line. The > >>> performance is > >>> recovered. > >>> > >>> How about below patch? Almost all performance is recovered with the new > >>> patch. > >>> > >>> Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]> > >> Could you add a comment someplace that says "refcnt wants to be on a > >> different > >> cache line from input/output/ops or performance tanks badly", to warn some > >> future kernel hacker who starts adding new fields to the structure? > > Ok. Below is the new patch. > > > > 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So > > sizeof(dst_entry)=200 > > no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core > > tigerton by > > moving tclassid to different place. It looks like tclassid could also have > > impact on > > performance. > > If moving tclassid before metrics, or just don't move tclassid, the > > performance isn't > > good. So I move it behind metrics. > > > > 2) Add comments before __refcnt. > > > > If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better > > than > > the one without the patch. > > > > If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better > > than > > the one without the patch. > > > > Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]> > > > > --- > > > > --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.0 > > +0800 > > +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 > > +0800 > > @@ -52,15 +52,10 @@ struct dst_entry > > unsigned short header_len; /* more space at head required > > */ > > unsigned short trailer_len;/* space to reserve at tail */ > > > > - u32 metrics[RTAX_MAX]; > > - struct dst_entry*path; > > - > > - unsigned long rate_last; /* rate limiting for ICMP */ > > unsigned intrate_tokens; > > + unsigned long rate_last; /* rate limiting for ICMP */ > > > > -#ifdef CONFIG_NET_CLS_ROUTE > > - __u32 tclassid; > > -#endif > > + struct dst_entry*path; > > > > struct neighbour*neighbour; > > struct hh_cache *hh; > > @@ -70,10 +65,20 @@ struct dst_entry > > int (*output)(struct sk_buff*); > > > > struct dst_ops *ops; > > - > > - unsigned long lastuse; > > + > > + u32 metrics[RTAX_MAX]; > > + > > +#ifdef CONFIG_NET_CLS_ROUTE > > + __u32 tclassid; > > +#endif > > + > > + /* > > +* __refcnt wants to be on a different cache line from > > +* input/output/ops or performance tanks badly > > +*/ > > atomic_t__refcnt; /* client references*/ > > int __use; > > + unsigned long lastuse; > > union { > > struct dst_entry *next; > > struct rtable*rt_next; > > > > > > > > I prefer this patch, but unfortunatly your perf numbers are for 64 bits > kernels. > > Could you please test now with 32 bits one ? I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no difference between pure kernel and patched kernel. New update: On 8-core stoakley, the regression becomes 2~3% with kernel 2.6.25-rc2. On tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 cores+hyperthreading), the regression is still 4% with 2.6.25-rc2. With my patch, on tigerton, almost all regression disappears. On tulsa, only about 2% regression disappears. So this issue is triggerred with multiple-cpu. Perhaps process scheduler is another factor causing the issue to happen, but it's very hard to change scheduler. Eric, I tested your new patch in function loopback_xmit. It has no improvement, while it doesn't introduce new issues. As you tested it on dual-core machine and got improvement, how about merging your patch with mine? -yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/