Re: [patch] add kdump_after_notifier
Vivek Goyal wrote: > On Tue, Aug 21, 2007 at 06:18:31AM -0700, Jay Lan wrote: > [..] >>>>> Now user will be able to view all the die_chain users through sysfs and >>>>> be able to modify the order in which these should run by modifying their >>>>> priority. Hence all the RAS tools can co-exist. >>>> This is my image of your proposal. >>>> >>>> - Print current order >>>> >>>> # cat /sys/class/misc/debug/panic_notifier_list >>>> priority name >>>> 1 IPMI >>>> 2 watchdog >>>> 3 Kdb >>>> 4 Kdump >>>> >>> I think Bernhard's suggestion looks better here. I noticed that >>> /sys/kernel/debug is already present. So how about following. >>> >>> /sys/kernel/debug/kdump/priority >>> /sys/kernel/debug/kdb/priority >>> /sys/kernel/debug/IPMI/priority >> Why separate priority files is better than a central file? >> At least i think you get a grand picture of priority being >> defined for all parties with a central file? >> > > I thought of couple of reasons. > - A very different syntax to modify the priority. > - Separate directories allow easy future extensions in terms of more > files. For example, putting a small "description" file in each dir > where each registered user can specify what does it do. The first can be easily resolved by providing a comment section in the file with real examples. Users can simply uncomment a line to activate. But future expansion is certainly is a good reason for this layout. > > But I agree that a single file is good for consolidated view. As bernhard > suggested, may be we should also implement a read only file where one > will get a consolidated view. Yep, this will help! > >> What do we decide priority if more than one component has >> the same priority value? >> > > I think first come first serve would be appropriate in this case instead of > returning -EINVAL. How does the kernel process the configuration files? By alphabetic order of the filename? Either way, i think a clear failure/warning dmesg is very important. Thanks, - jay > > Thanks > Vivek > > ___ > kexec mailing list > [EMAIL PROTECTED] > http://lists.infradead.org/mailman/listinfo/kexec - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] add kdump_after_notifier
Vivek Goyal wrote: > On Thu, Aug 16, 2007 at 06:26:35PM +0900, Takenori Nagano wrote: >> Vivek Goyal wrote: >> > So for the time being I think we can put RAS tools on die notifier list >>> and if it runs into issues we can always think of creating a separate list. >>> >>> Few things come to mind. >>> >>> - Why there is a separate panic_notifier_list? Can't it be merged with >>> die_chain? die_val already got one of the event type as PANIC. If there >>> are no specific reasons then we should merge the two lists. Registering >>> RAS tools on a single list is easier. >> I think it is difficult, because die_chain is defined by each architecture. >> > > I think die_chain is arch independent definition (kernel/die_notifier.c)? > But anyway, to begin with it can be done only for panic_notifier. > >>> - Modify Kdump to register on die_chain list. >>> - Modify Kdb to register on die_chain list. >>> - Export all the registered members of die_chain through sysfs along with >>> their priorities. Priorities should be modifiable. Most likely one >>> shall have to introduce additional field in struct notifier_block. This >>> field will be a string as an identifier of the user registerd. e.g >>> "Kdump", "Kdb" etc. >>> >>> Now user will be able to view all the die_chain users through sysfs and >>> be able to modify the order in which these should run by modifying their >>> priority. Hence all the RAS tools can co-exist. >> This is my image of your proposal. >> >> - Print current order >> >> # cat /sys/class/misc/debug/panic_notifier_list >> priority name >> 1 IPMI >> 2 watchdog >> 3 Kdb >> 4 Kdump >> > > I think Bernhard's suggestion looks better here. I noticed that > /sys/kernel/debug is already present. So how about following. > > /sys/kernel/debug/kdump/priority > /sys/kernel/debug/kdb/priority > /sys/kernel/debug/IPMI/priority Why separate priority files is better than a central file? At least i think you get a grand picture of priority being defined for all parties with a central file? What do we decide priority if more than one component has the same priority value? Thanks, - jay > > I think at some point of time we shall have to create another file say > description. > > /sys/kernel/debug/IPMI/description > > Which can tell what does this tool do? Other a user might not have any > clue how to prioritize various things. > > Thanks > Vivek > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Performance Stats: Kernel patch
Andrew Morton wrote: > On Tue, 05 Jun 2007 14:43:50 + Maxim Uvarov <[EMAIL PROTECTED]> wrote: > >> Patch makes available to the user the following >> task and process performance statistics: >> * Involuntary Context Switches (task_struct->nivcsw) >> * Voluntary Context Switches (task_struct->nvcsw) >> >> Statistics information is available from: >> 1. taskstats interface (Documentation/accounting/) >> 2. /proc/PID/status (task only). >> >> This data is useful for detecting hyperactivity >> patterns between processes. >> Signed-off-by: Maxim Uvarov <[EMAIL PROTECTED]> > > Thanks. > > Please retain suitable cc's when discussing a patch. I keep on having to > find your updates on the mailing list, and then adding all the same cc's > when I reply. > > Your patch seemed to be against some horridly prehistoric kernel, so I had > a few things to fix up. > > I disagree with the term "rates" for these things. They are counters, not > rates. Adjustments are below. > > We still need to think about whether we missed any other fields. My initial cut of bringing in general accounting fields was to cover the "struct acct" in linux/acct.h. It was done so that at least data needed by BSD accounting is in "struct taskstats". We need to poll other interested parties into this subject, or we can add more fields later when users of other accounting fields decide to take advantage of the taskstats interface. Thanks, - jay > > > diff -puN > Documentation/accounting/getdelays.c~taskstats-add-context-switch-counters-fix > Documentation/accounting/getdelays.c > --- > a/Documentation/accounting/getdelays.c~taskstats-add-context-switch-counters-fix > +++ a/Documentation/accounting/getdelays.c > @@ -49,7 +49,7 @@ char name[100]; > int dbg; > int print_delays; > int print_io_accounting; > -int print_task_context_switch_rates; > +int print_task_context_switch_counts; > __u64 stime, utime; > > #define PRINTF(fmt, arg...) {\ > @@ -205,7 +205,7 @@ void print_delayacct(struct taskstats *t > "count", "delay total", t->swapin_count, t->swapin_delay_total); > } > > -void task_context_switch_rates(struct taskstats *t) > +void task_context_switch_counts(struct taskstats *t) > { > printf("\n\nTask %15s%15s\n" > " %15lu%15lu\n", > @@ -259,7 +259,7 @@ int main(int argc, char *argv[]) > break; > case 'q': > printf("printing task/process context switch rates\n"); > - print_task_context_switch_rates = 1; > + print_task_context_switch_counts = 1; > break; > case 'w': > logfile = strdup(optarg); > @@ -402,8 +402,8 @@ int main(int argc, char *argv[]) > print_delayacct((struct > taskstats *) NLA_DATA(na)); > if (print_io_accounting) > print_ioacct((struct > taskstats *) NLA_DATA(na)); > - if > (print_task_context_switch_rates) > - > task_context_switch_rates((struct taskstats *) NLA_DATA(na)); > + if > (print_task_context_switch_counts) > + > task_context_switch_counts((struct taskstats *) NLA_DATA(na)); > if (fd) { > if (write(fd, > NLA_DATA(na), na->nla_len) < 0) { > err(1,"write > error\n"); > diff -puN > Documentation/accounting/taskstats-struct.txt~taskstats-add-context-switch-counters-fix > Documentation/accounting/taskstats-struct.txt > --- > a/Documentation/accounting/taskstats-struct.txt~taskstats-add-context-switch-counters-fix > +++ a/Documentation/accounting/taskstats-struct.txt > @@ -22,7 +22,7 @@ There are three different groups of fiel > /* Extended accounting fields end */ > Their values are collected if CONFIG_TASK_XACCT is set. > > -4) Per-task and per-thread context switch rates statistics > +4) Per-task and per-thread context switch count statistics > > Future extension should add fields to the end of the taskstats struct, and > should not change the relative position of each field within the struct. > diff -puN fs/proc/array.c~taskstats-add-context-switch-counters-fix > fs/proc/array.c > --- a/fs/proc/array.c~taskstats-add-context-switch-counters-fix > +++ a/fs/proc/array.c > @@ -290,7 +290,9 @@ static inline char *task_cap(struct task > cap_t(p->cap_permitted), > cap_t(p->cap_effective)); > } > -static inline char *task_context_switch_rates(struct task_struct *p, char
Re: [PATCH] Performance Stats: Kernel patch
Andrew Morton wrote: > On Wed, 30 May 2007 18:49:46 + > Maxim Uvarov <[EMAIL PROTECTED]> wrote: > >> Removed syscall counters from patch. >> >> >> >> >> Patch makes available to the user the following >> task and process performance statistics: >> * Involuntary Context Switches (task_struct->nivcsw) >> * Voluntary Context Switches (task_struct->nvcsw) >> >> Statistics information is available from: >> 1. taskstats interface (Documentation/accounting/) >> 2. /proc/PID/status (task only). >> >> This data is useful for detecting hyperactivity >> patterns between processes. >> Signed-off-by: Maxim Uvarov <[EMAIL PROTECTED]> >> > > A few little things: > >> diff --git a/Documentation/accounting/getdelays.c >> b/Documentation/accounting/getdelays.c >> index e9126e7..18d22ad 100644 >> --- a/Documentation/accounting/getdelays.c >> +++ b/Documentation/accounting/getdelays.c >> @@ -49,6 +49,7 @@ char name[100]; >> int dbg; >> int print_delays; >> int print_io_accounting; >> +int print_task_stats; >> __u64 stime, utime; >> >> #define PRINTF(fmt, arg...) { \ >> @@ -187,7 +188,7 @@ void print_delayacct(struct taskstats *t) >> "IO%15s%15s\n" >> " %15llu%15llu\n" >> "MEM %15s%15s\n" >> - " %15llu%15llu\n\n", >> + " %15llu%15llu\n" >> "count", "real total", "virtual total", "delay total", >> t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total, >> t->cpu_delay_total, >> @@ -196,6 +197,14 @@ void print_delayacct(struct taskstats *t) >> "count", "delay total", t->swapin_count, t->swapin_delay_total); >> } >> >> +void print_taskstats(struct taskstats *t) >> +{ >> +printf("\n\nTask %15s%15s\n" >> + " %15lu%15lu\n", >> + "voluntary", "nonvoluntary", >> + t->nvcsw, t->nivcsw); >> +} > > print_task_stats versus print_taskstats is a bit confusing, but I guess it > doesn't matter. > > More significantly, the whole idea of calling it "task stats" isn't a good > one: it's far too general. The whole kernel interface is called taskstats, > but the additions here are a tiny part of that. > > Perhaps task_context_switch_rates would be more appropriate, although > rather a lot to type. > >> void print_ioacct(struct taskstats *t) >> { >> printf("%s: read=%llu, write=%llu, cancelled_write=%llu\n", >> @@ -227,7 +236,7 @@ int main(int argc, char *argv[]) >> struct msgtemplate msg; >> >> while (1) { >> -c = getopt(argc, argv, "diw:r:m:t:p:v:l"); >> +c = getopt(argc, argv, "qdiw:r:m:t:p:v:l"); >> if (c < 0) >> break; >> >> @@ -240,6 +249,10 @@ int main(int argc, char *argv[]) >> printf("printing IO accounting\n"); >> print_io_accounting = 1; >> break; >> +case 'q': >> +printf("printing task/process stasistics:\n"); >> +print_task_stats = 1; >> +break; >> case 'w': >> strncpy(logfile, optarg, MAX_FILENAME); >> printf("write to file %s\n", logfile); >> @@ -381,6 +394,8 @@ int main(int argc, char *argv[]) >> print_delayacct((struct >> taskstats *) NLA_DATA(na)); >> if (print_io_accounting) >> print_ioacct((struct >> taskstats *) NLA_DATA(na)); >> +if (print_task_stats) >> +print_taskstats((struct >> taskstats *) NLA_DATA(na)); >> if (fd) { >> if (write(fd, >> NLA_DATA(na), na->nla_len) < 0) { >> err(1,"write >> error\n"); >> diff --git a/Documentation/accounting/taskstats-struct.txt >> b/Documentation/accounting/taskstats-struct.txt >> index 661c797..c3ae6a9 100644 >> --- a/Documentation/accounting/taskstats-struct.txt >> +++ b/Documentation/accounting/taskstats-struct.txt >> @@ -22,6 +22,8 @@ There are three different groups of fields in the struct >> taskstats: >> /* Extended accounting fields end */ >> Their values are collected if CONFIG_TASK_XACCT is set. >> >> +4) Per-task and per-thread statistics >> + >> Future extension should add fields to the end of the taskstats struct, and >> should not change the relative position of each field within the struct. >> >> @@ -158,4 +160,8 @@ struct taskstats { >> >> /* Extended accounting fields end */ >> >> +4) Per-task and per-thread statistics >> +__u64 nvcsw; /* Context vo
Re: [PATCH] Performance Stats: Kernel patch
Add Jonathan Lim to cc, who is working on CSA userland implementation to use the taskstats data that this patch is going to remove. Thanks, - jay Andrew Morton wrote: > On Wed, 30 May 2007 18:49:46 + > Maxim Uvarov <[EMAIL PROTECTED]> wrote: > >> Removed syscall counters from patch. >> >> >> >> >> Patch makes available to the user the following >> task and process performance statistics: >> * Involuntary Context Switches (task_struct->nivcsw) >> * Voluntary Context Switches (task_struct->nvcsw) >> >> Statistics information is available from: >> 1. taskstats interface (Documentation/accounting/) >> 2. /proc/PID/status (task only). >> >> This data is useful for detecting hyperactivity >> patterns between processes. >> Signed-off-by: Maxim Uvarov <[EMAIL PROTECTED]> >> > > A few little things: > >> diff --git a/Documentation/accounting/getdelays.c >> b/Documentation/accounting/getdelays.c >> index e9126e7..18d22ad 100644 >> --- a/Documentation/accounting/getdelays.c >> +++ b/Documentation/accounting/getdelays.c >> @@ -49,6 +49,7 @@ char name[100]; >> int dbg; >> int print_delays; >> int print_io_accounting; >> +int print_task_stats; >> __u64 stime, utime; >> >> #define PRINTF(fmt, arg...) { \ >> @@ -187,7 +188,7 @@ void print_delayacct(struct taskstats *t) >> "IO%15s%15s\n" >> " %15llu%15llu\n" >> "MEM %15s%15s\n" >> - " %15llu%15llu\n\n", >> + " %15llu%15llu\n" >> "count", "real total", "virtual total", "delay total", >> t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total, >> t->cpu_delay_total, >> @@ -196,6 +197,14 @@ void print_delayacct(struct taskstats *t) >> "count", "delay total", t->swapin_count, t->swapin_delay_total); >> } >> >> +void print_taskstats(struct taskstats *t) >> +{ >> +printf("\n\nTask %15s%15s\n" >> + " %15lu%15lu\n", >> + "voluntary", "nonvoluntary", >> + t->nvcsw, t->nivcsw); >> +} > > print_task_stats versus print_taskstats is a bit confusing, but I guess it > doesn't matter. > > More significantly, the whole idea of calling it "task stats" isn't a good > one: it's far too general. The whole kernel interface is called taskstats, > but the additions here are a tiny part of that. > > Perhaps task_context_switch_rates would be more appropriate, although > rather a lot to type. > >> void print_ioacct(struct taskstats *t) >> { >> printf("%s: read=%llu, write=%llu, cancelled_write=%llu\n", >> @@ -227,7 +236,7 @@ int main(int argc, char *argv[]) >> struct msgtemplate msg; >> >> while (1) { >> -c = getopt(argc, argv, "diw:r:m:t:p:v:l"); >> +c = getopt(argc, argv, "qdiw:r:m:t:p:v:l"); >> if (c < 0) >> break; >> >> @@ -240,6 +249,10 @@ int main(int argc, char *argv[]) >> printf("printing IO accounting\n"); >> print_io_accounting = 1; >> break; >> +case 'q': >> +printf("printing task/process stasistics:\n"); >> +print_task_stats = 1; >> +break; >> case 'w': >> strncpy(logfile, optarg, MAX_FILENAME); >> printf("write to file %s\n", logfile); >> @@ -381,6 +394,8 @@ int main(int argc, char *argv[]) >> print_delayacct((struct >> taskstats *) NLA_DATA(na)); >> if (print_io_accounting) >> print_ioacct((struct >> taskstats *) NLA_DATA(na)); >> +if (print_task_stats) >> +print_taskstats((struct >> taskstats *) NLA_DATA(na)); >> if (fd) { >> if (write(fd, >> NLA_DATA(na), na->nla_len) < 0) { >> err(1,"write >> error\n"); >> diff --git a/Documentation/accounting/taskstats-struct.txt >> b/Documentation/accounting/taskstats-struct.txt >> index 661c797..c3ae6a9 100644 >> --- a/Documentation/accounting/taskstats-struct.txt >> +++ b/Documentation/accounting/taskstats-struct.txt >> @@ -22,6 +22,8 @@ There are three different groups of fields in the struct >> taskstats: >> /* Extended accounting fields end */ >> Their values are collected if CONFIG_TASK_XACCT is set. >> >> +4) Per-task and per-thread statistics >> + >> Future extension should add fields to the end of the taskstats struct, and >> should not change the relative position of each field within the struct. >> >> @@ -158,4 +160,8 @@ struct taskstats {
Re: [Fastboot] [PATCH 1/1] - platform_kernel_launch_event is noop on generic kernel
Bernhard Walle wrote: > * Bernhard Walle <[EMAIL PROTECTED]> [2007-03-01 13:57]: >> * Horms <[EMAIL PROTECTED]> [2007-03-01 05:22]: >>> On Wed, Feb 28, 2007 at 01:45:17PM -0600, John Keller wrote: >>>> Add a missing #define for the platform_kernel_launch_event. >>>> Without this fix, a call to platform_kernel_launch_event() >>>> becomes a noop on generic kernels. SN systems require this >>>> fix to successfully kdump/kexec from certain hardware errors. >>>> >>>> Signed-off-by: John Keller <[EMAIL PROTECTED]> >>> I made a similar change when porting to xen, but I hadn't thought >>> to see if mainline linux needs it to. >>> >>> Acked-by: Simon Horman <[EMAIL PROTECTED]> >> I think there's an additional change needed. Without that, it leads to >> a NULL pointer dereference. Yep, it is needed. The missing line was in Jack's original patch but was lost somehow in 2.6.20. Acked-by: Jay Lan <[EMAIL PROTECTED]> > > Maybe I should also get my editor right ... ;): > > --- > include/asm-ia64/machvec.h |2 ++ > 1 file changed, 2 insertions(+) > > Index: b/include/asm-ia64/machvec.h > === > --- a/include/asm-ia64/machvec.h > +++ b/include/asm-ia64/machvec.h > @@ -168,6 +168,7 @@ extern void machvec_tlb_migrate_finish ( > # define platform_setup_msi_irq ia64_mv.setup_msi_irq > # define platform_teardown_msi_irq ia64_mv.teardown_msi_irq > # define platform_pci_fixup_bus ia64_mv.pci_fixup_bus > +# define platform_kernel_launch_event ia64_mv.kernel_launch_event > # endif > > /* __attribute__((__aligned__(16))) is required to make size of the > @@ -269,6 +270,7 @@ struct ia64_machine_vector { > platform_setup_msi_irq, \ > platform_teardown_msi_irq, \ > platform_pci_fixup_bus, \ > + platform_kernel_launch_event\ > } > > extern struct ia64_machine_vector ia64_mv; > - > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: Fix CONFIG_SMP=n compilation V2 (ia64)
Magnus Damm wrote: > kexec: Fix CONFIG_SMP=n compilation V2 (ia64) > > Kexec support for 2.6.20 on ia64 does not build properly using a config > made up by CONFIG_SMP=n and CONFIG_HOTPLUG_CPU=n: > > CC arch/ia64/kernel/machine_kexec.o > arch/ia64/kernel/machine_kexec.c: In function `machine_shutdown': > arch/ia64/kernel/machine_kexec.c:77: warning: implicit declaration of > function `cpu_down' > AS arch/ia64/kernel/relocate_kernel.o > CC arch/ia64/kernel/crash.o > arch/ia64/kernel/crash.c: In function `kdump_cpu_freeze': > arch/ia64/kernel/crash.c:139: warning: implicit declaration of function > `ia64_jump_to_sal' > arch/ia64/kernel/crash.c:139: error: `sal_boot_rendez_state' undeclared > (first use in this function) > arch/ia64/kernel/crash.c:139: error: (Each undeclared identifier is reported > only once > arch/ia64/kernel/crash.c:139: error: for each function it appears in.) > arch/ia64/kernel/crash.c: At top level: > arch/ia64/kernel/crash.c:84: warning: 'kdump_wait_cpu_freeze' defined but not > used > make[1]: *** [arch/ia64/kernel/crash.o] Error 1 > make: *** [arch/ia64/kernel] Error 2 > > Signed-off-by: Magnus Damm <[EMAIL PROTECTED]> > --- Looks good to me also. Acked-by: Jay Lan <[EMAIL PROTECTED]> > machine_kexec.c > Applies on top of 2.6.20. > > arch/ia64/kernel/crash.c | 11 +++ > arch/ia64/kernel/machine_kexec.c |2 ++ > 2 files changed, 9 insertions(+), 4 deletions(-) > > --- 0002/arch/ia64/kernel/crash.c > +++ work/arch/ia64/kernel/crash.c 2007-02-02 20:31:18.0 +0900 > @@ -79,6 +79,7 @@ crash_save_this_cpu() > final_note(buf); > } > > +#ifdef CONFIG_SMP > static int > kdump_wait_cpu_freeze(void) > { > @@ -91,6 +92,7 @@ kdump_wait_cpu_freeze(void) > } > return 1; > } > +#endif > > void > machine_crash_shutdown(struct pt_regs *pt) > @@ -132,11 +134,12 @@ kdump_cpu_freeze(struct unw_frame_info * > atomic_inc(&kdump_cpu_freezed); > kdump_status[cpuid] = 1; > mb(); > - if (cpuid == 0) { > - for (;;) > - cpu_relax(); > - } else > +#ifdef CONFIG_HOTPLUG_CPU > + if (cpuid != 0) > ia64_jump_to_sal(&sal_boot_rendez_state[cpuid]); > +#endif > + for (;;) > + cpu_relax(); > } > > static int > --- 0002/arch/ia64/kernel/machine_kexec.c > +++ work/arch/ia64/kernel/machine_kexec.c 2007-02-02 20:56:24.0 > +0900 > @@ -70,12 +70,14 @@ void machine_kexec_cleanup(struct kimage > > void machine_shutdown(void) > { > +#ifdef CONFIG_HOTPLUG_CPU > int cpu; > > for_each_online_cpu(cpu) { > if (cpu != smp_processor_id()) > cpu_down(cpu); > } > +#endif > kexec_disable_iosapic(); > } > > - > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fastboot] [PATCH] kexec: Fix CONFIG_SMP=n compilation (ia64)
Horms wrote: > On Fri, Feb 02, 2007 at 11:01:21AM +0800, Zou, Nanhai wrote: > void > machine_crash_shutdown(struct pt_regs *pt) > @@ -132,11 +134,12 @@ kdump_cpu_freeze(struct unw_frame_info * > atomic_inc(&kdump_cpu_freezed); > kdump_status[cpuid] = 1; > mb(); > - if (cpuid == 0) { > - for (;;) > - cpu_relax(); > - } else > +#ifdef CONFIG_HOTPLUG_CPU > + if (cpuid != 0) > ia64_jump_to_sal(&sal_boot_rendez_state[cpuid]); > +#endif > + for (;;) > + cpu_relax(); > } I trust ia64_jump_to_sal doesn't return. >>> So do I. The main problem with the compilation seems to be that >>> ia64_jump_to_sal() only exists if CONFIG_HOTPLUG_CPU=y. >>> (include/asm-ia64/sal.h, arch/ia64/kernel/head.S) >>> >> This may cause problem on SN platform. >> I remember SN requires cpu0 return to SAL rendez loop to do IRQ redirection. >> However this needs SGI people to confirm... SN needs slave cpus being returned to SAL rendez loop with the exception of cpu0. It seems there is not a decent way to return cpu0, so we decided to handle cpu0 in PROM. This above code change does not casue problem on SN platform. > > Does this mean that CONFIG_HOTPLUG_CPU may be required for kdump > on the SN platform? The SN platform uses the ia64_jump_to_sal() routine. Thanks, - jay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Fastboot] [PATCH] Kdump documentation update for 2.6.20: ia64 portion
Horms wrote: > Hi, > > this patch fills in the portions for ia64 kexec. > > I'm actually not sure what options are required for the dump-capture > kernel, but "init 1 irqpoll maxcpus=1" has been working fine for me. > Or more to the point, I'm not sure if irqpoll is needed or not. > > This patch requires the documentation patch update that Vivek Goyal has > been circulating, and I believe is currently in mm. Feel free to fold it > into that change if it makes things easier for anyone. > > Take II > > Nanhai, > > I have noted that vmlinux.gz may also be used. And added a note about the > kernel being able to automatically place the crashkernel region. > Furthermore, I added a note that if manually specified, the region should > be 64Mb aligned to avoid wastage. I notice that the auto placement code > uses 64Mb. But is this strictly neccessary for all page sizes? > > Signed-off-by: Simon Horman <[EMAIL PROTECTED]> > > Index: linux-2.6/Documentation/kdump/kdump.txt > === > --- linux-2.6.orig/Documentation/kdump/kdump.txt 2007-01-12 > 17:45:19.0 +0900 > +++ linux-2.6/Documentation/kdump/kdump.txt 2007-01-12 17:59:42.0 > +0900 > @@ -17,7 +17,7 @@ > memory image to a dump file on the local disk, or across the network to > a remote system. > > -Kdump and kexec are currently supported on the x86, x86_64, ppc64 and IA64 > +Kdump and kexec are currently supported on the x86, x86_64, ppc64 and ia64 > architectures. > > When the system kernel boots, it reserves a small section of memory for > @@ -229,7 +229,23 @@ > > Dump-capture kernel config options (Arch Dependent, ia64) > -- > -(To be filled) > + > +- No specific options are required to create a dump-capture kernel > + for ia64, other than those specified in the arch idependent section > + above. This means that it is possible to use the system kernel > + as a dump-capture kernel if desired. > + > + The crashkernel region can be automatically placed by the system > + kernel at run time. This is done by specifying the base address as 0, > + or omitting it all together. In my testing, i found the base address was ignored. Whatever value specified was fine. Not necessary to be 0. But i guess it is fine to give people a guideline telling them to specify 0. Cheers, - jay > + > + [EMAIL PROTECTED] > + or > + crashkernel=256M > + > + If the start address is specified, not that the start address of the > + kernel will be alligned to 64Mb, so any if the start address is not then > + any space below the alignment point will be wasted. > > > Boot into System Kernel > @@ -248,6 +264,10 @@ > > On ppc64, use "[EMAIL PROTECTED]". > > + On ia64, [EMAIL PROTECTED] is a generous value that typically works. > + The region may be automatically placed on ia64, see the > + dump-capture kernel config option notes above. > + > Load the Dump-capture Kernel > > > @@ -266,7 +286,8 @@ > For ppc64: > - Use vmlinux > For ia64: > - (To be filled) > + - Use vmlinux or vmlinuz.gz > + > > If you are using a uncompressed vmlinux image then use following command > to load dump-capture kernel. > @@ -282,18 +303,19 @@ > --initrd= \ > --append="root= " > > +Please note, that --args-linux does not need to be specified for ia64. > +It is planned to make this a no-op on that architecture, but for now > +it should be omitted > + > Following are the arch specific command line options to be used while > loading dump-capture kernel. > > -For i386 and x86_64: > +For i386, x86_64 and ia64: > "init 1 irqpoll maxcpus=1" > > For ppc64: > "init 1 maxcpus=1 noirqdistrib" > > -For IA64 > - (To be filled) > - > > Notes on loading the dump-capture kernel: > > ___ > fastboot mailing list > fastboot@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/fastboot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] largefile support for accounting
Yes, our CSA customers reported same problem. CSA already incoporated same fix to our kernel module code. :) Cheers, - jay Peter Staubach wrote: Hi. There is a problem in the accounting subsystem in the kernel can not correctly handle files larger than 2GB. The output file containing the process accounting data can grow very large if the system is large enough and active enough. If the 2GB limit is reached, then the system simply stops storing process accounting data. Another annoying problem is that once the system reaches this 2GB limit, then every process which exits will receive a signal, SIGXFSZ. This signal is generated because an attempt was made to write beyond the limit for the file descriptor. This signal makes it look like every process has exited due to a signal, when in fact, they have not. The solution is to add the O_LARGEFILE flag to the list of flags used to open the accounting file. The rest of the accounting support is already largefile safe. The changes were tested by constructing a large file (just short of 2GB), enabling accounting, and then running enough commands to cause the accounting data generated to increase the size of the file to 2GB. Without the changes, the file grows to 2GB and the last command run in the test script appears to exit due a signal when it has not. With the changes, things work as expected and quietly. There are some user level changes required so that it can deal with largefiles, but those are being handled separately. Signed-off-by: Peter Staubach <[EMAIL PROTECTED]> --- linux-2.6.12/kernel/acct.c.org 2005-06-17 15:48:29.0 -0400 +++ linux-2.6.12/kernel/acct.c 2005-08-10 15:12:46.0 -0400 @@ -220,7 +220,7 @@ asmlinkage long sys_acct(const char __us return (PTR_ERR(tmp)); } /* Difference from BSD - they don't do O_APPEND */ - file = filp_open(tmp, O_WRONLY|O_APPEND, 0); + file = filp_open(tmp, O_WRONLY | O_APPEND | O_LARGEFILE, 0); putname(tmp); if (IS_ERR(file)) { return (PTR_ERR(file)); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector
I based my listen program on the fclisten.c posted by Kaigai Kohei with my own modification. Unfortunately i lost my test machine in the lab. I will recreate the listen program Monday. The original listener did not validate sequence number. It also prints length of data and sequence number of every message it receives. My listener prints only out-of-sequence error messages. The fork generator fork-test.c was yours? I called it fork-test and let it run continuously in while-loop: # while 1 # ./fork-test 1000 # sleep 1 # end I let it do 10,000,000 times of fork continuously while the system is running AIM7 and/or ubench. The original fclisten.c and fork-test.c are attached for your reference. - jay Evgeniy Polyakov wrote: On Fri, 08 Apr 2005 20:31:20 -0700 Jay Lan <[EMAIL PROTECTED]> wrote: With the patch you provide to me, i did not see the bugcheck at cn_queue_wrapper() at the console. Unmatched sequence number messages still happened. We expect to lose packets under system stressed situation, but i still observed duplicate messages, which concerned me. What tool and what version do you use as message generator and receiver? Thanks, - jay Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt #include #include #include #include #include #include #include void usage(){ puts("usage: fclisten "); puts(" Default -> listening fork-connector"); puts(" on -> fork-connector Enable"); puts(" off -> fork-connector Disable"); exit(0); } #define MODE_LISTEN (1) #define MODE_ENABLE (2) #define MODE_DISABLE (3) struct cb_id { __u32 idx; __u32 val; }; struct cn_msg { struct cb_idid; __u32 seq; __u32 ack; __u32 len;/* Length of the following data */ __u8data[0]; }; int main(int argc, char *argv[]){ char buf[4096]; int mode, sockfd, len; struct sockaddr_nl ad; struct nlmsghdr *hdr = (struct nlmsghdr *)buf; struct cn_msg *msg = (struct cn_msg *)(buf+sizeof(struct nlmsghdr)); switch(argc){ case 1: mode = MODE_LISTEN; break; case 2: if (strcasecmp("on",argv[1])==0) { mode = MODE_ENABLE; }else if (strcasecmp("off",argv[1])==0){ mode = MODE_DISABLE; }else{ usage(); } break; default: usage(); break; } if( (sockfd=socket(PF_NETLINK, SOCK_RAW, NETLINK_NFLOG)) < 0 ){ fprintf(stderr, "Fault on socket().\n"); return( 1 ); } ad.nl_family = AF_NETLINK; ad.nl_pad = 0; ad.nl_pid = getpid(); ad.nl_groups = CN_IDX_FORK; if( bind(sockfd, (struct sockaddr *)&ad, sizeof(ad)) ){ fprintf(stderr, "Fault on bind to netlink.\n"); return( 2 ); } if (mode==MODE_LISTEN) { while(-1){ len = recvfrom(sockfd, buf, 4096, 0, NULL, NULL); printf("%d-byte recv Seq=%d\n", len, hdr->nlmsg_seq); } }else{ ad.nl_family = AF_NETLINK; ad.nl_pad = 0; ad.nl_pid = 0; ad.nl_groups = 1; hdr->nlmsg_len = sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + sizeof(int); hdr->nlmsg_type = 0; hdr->nlmsg_flags = 0; hdr->nlmsg_seq = 0; hdr->nlmsg_pid = getpid(); msg->id.idx = 0xfeed; msg->id.val = 0xbeef; msg->seq = msg->ack = 0; msg->len = sizeof(int); if (mode==MODE_ENABLE){ (*(int *)(msg->data)) = 1; } else { (*(int *)(msg->data)) = 0; } sendto(sockfd, buf, sizeof(struct nlmsghdr)+sizeof(struct cn_msg)+sizeof(int), 0, (struct sockaddr *)&ad, sizeof(ad)); } } #include #include int main(int argc, char *argv[]) { int pid; int i = 0, max = 10; struct timeval tv0, tv1; struct timezone tz; long diff; if (argc >= 2) max = atoi(argv[1]); signal(SIGCHLD, SIG_IGN); gettimeofday(&tv0, &tz); while (i++ < max) { pid = fork(); if (pid == 0) { sleep(1); exit (0); } } gettimeofday(&tv1, &tz); diff = (tv1.tv_sec - tv0.tv_sec)*100 + (tv1.tv_usec - tv0.tv_usec); printf("Average per process fork+exit time is %ld usecs [diff=%lu, max=%d].\n", diff/max, diff, max); return 0; }
Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector
With the patch you provide to me, i did not see the bugcheck at cn_queue_wrapper() at the console. Unmatched sequence number messages still happened. We expect to lose packets under system stressed situation, but i still observed duplicate messages, which concerned me. Unmatched seq. Rcvd=79477, expected=79478 <=== duplicate Unmatched seq. Rcvd=713823, expected=713422 * loss of 401 msgs Unmatched seq. Rcvd=80024, expected=79901 * loss of 123 msgs Unmatched seq. Rcvd=93632, expected=93633 <=== duplicate Unmatched seq. Rcvd=94718, expected=93970 Unmatched seq. Rcvd=743576, expected=743502 Unmatched seq. Rcvd=123506, expected=123507 <=== duplicate Unmatched seq. Rcvd=773753, expected=773503 Unmatched seq. Rcvd=124111, expected=123938 Unmatched seq. Rcvd=157172, expected=157173 <=== duplicate Unmatched seq. Rcvd=813024, expected=812913 <=== duplicate Unmatched seq. Rcvd=813024, expected=813025 <=== duplicate Unmatched seq. Rcvd=157830, expected=157501 Unmatched seq. Rcvd=158408, expected=158145 Unmatched seq. Rcvd=813678, expected=813438 The test system was a two cpu ia64. Thanks, - jay Evgeniy Polyakov wrote: On Fri, 08 Apr 2005 15:08:13 -0700 Jay Lan <[EMAIL PROTECTED]> wrote: Hi Evgeniy, Forget about my previous request of a new patch. The failures were straight forward enough to figure out. Ok. The latest sources are always awailable at http://tservice.net.ru/~s0mbre/archive/connector - jay Jay Lan wrote: My workarea was based on 2.6.12-rc1-mm4 plus Guilluame's patch. Your patch caused 5 out of 8 hunks failure at connector.c and one failure at connector.h. Could you generate a new patch based on my version? A tar file of complete source of drivers/connector would work also. :) Thanks! - jay Evgeniy Polyakov wrote: Could you give attached patch a try instead of previous one. It adds gfp mask into cn_netlink_send() call also. If you need updated CBUS sources, feel free to ask, I will send updated sources with Andrew's comments resolved too. I do not know exactly your connector version, so patch will probably be applied with fuzz. feel free to contact if it does not apply, I will send the whole sources. Thank you. * looking for [EMAIL PROTECTED]/connector--main--0--patch-38 to compare with * comparing to [EMAIL PROTECTED]/connector--main--0--patch-38 M connector.c M connector.h M cbus.c * modified files --- orig/drivers/connector/connector.c +++ mod/drivers/connector/connector.c @@ -70,7 +70,7 @@ * then it is new message. * */ -void cn_netlink_send(struct cn_msg *msg, u32 __groups) +void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask) { struct cn_callback_entry *n, *__cbq; unsigned int size; @@ -102,7 +102,7 @@ size = NLMSG_SPACE(sizeof(*msg) + msg->len); -skb = alloc_skb(size, GFP_ATOMIC); +skb = alloc_skb(size, gfp_mask); if (!skb) { printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size); return; @@ -119,11 +119,11 @@ #endif NETLINK_CB(skb).dst_groups = groups; - -uskb = skb_clone(skb, GFP_ATOMIC); -if (uskb) +#if 0 +uskb = skb_clone(skb, gfp_mask); +if (uskb && 0) netlink_unicast(dev->nls, uskb, 0, 0); - +#endif netlink_broadcast(dev->nls, skb, 0, groups, GFP_ATOMIC); return; @@ -158,7 +158,7 @@ } spin_unlock_bh(&dev->cbdev->queue_lock); -return found; +return (found)?0:-ENODEV; } /* @@ -181,7 +181,6 @@ "requested msg->len=%u[%u], nlh->nlmsg_len=%u, skb->len=%u.\n", msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)), nlh->nlmsg_len, skb->len); -kfree_skb(skb); return -EINVAL; } #if 0 @@ -215,17 +214,18 @@ skb->len, skb->data_len, skb->truesize, skb->protocol, skb_cloned(skb), skb_shared(skb)); #endif -while (skb->len >= NLMSG_SPACE(0)) { +if (skb->len >= NLMSG_SPACE(0)) { nlh = (struct nlmsghdr *)skb->data; + if (nlh->nlmsg_len < sizeof(struct cn_msg) || skb->len < nlh->nlmsg_len || nlh->nlmsg_len > CONNECTOR_MAX_MSG_SIZE) { -#if 0 +#if 1 printk(KERN_INFO "nlmsg_len=%u, sizeof(*nlh)=%u\n", nlh->nlmsg_len, sizeof(*nlh)); #endif kfree_skb(skb); -break; +goto out; } len = NLMSG_ALIGN(nlh->nlmsg_len); @@ -233,22 +233,11 @@ len = skb->len; err = __cn_rx_skb(skb, nlh); -if (err) { -#if 0 -if (err < 0 && (nlh->nlmsg_flags & NLM_F_ACK)) -netlink_ack(skb, nlh, -err); -#endif -break; -} else { -#if 0 -if (nlh->nlmsg_flags & NLM_F_ACK) -netlink_ack(skb, nlh, 0); -#endif -break; -} -skb_pull(sk
Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector
Hi Evgeniy, Forget about my previous request of a new patch. The failures were straight forward enough to figure out. - jay Jay Lan wrote: My workarea was based on 2.6.12-rc1-mm4 plus Guilluame's patch. Your patch caused 5 out of 8 hunks failure at connector.c and one failure at connector.h. Could you generate a new patch based on my version? A tar file of complete source of drivers/connector would work also. :) Thanks! - jay Evgeniy Polyakov wrote: Could you give attached patch a try instead of previous one. It adds gfp mask into cn_netlink_send() call also. If you need updated CBUS sources, feel free to ask, I will send updated sources with Andrew's comments resolved too. I do not know exactly your connector version, so patch will probably be applied with fuzz. feel free to contact if it does not apply, I will send the whole sources. Thank you. * looking for [EMAIL PROTECTED]/connector--main--0--patch-38 to compare with * comparing to [EMAIL PROTECTED]/connector--main--0--patch-38 M connector.c M connector.h M cbus.c * modified files --- orig/drivers/connector/connector.c +++ mod/drivers/connector/connector.c @@ -70,7 +70,7 @@ * then it is new message. * */ -void cn_netlink_send(struct cn_msg *msg, u32 __groups) +void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask) { struct cn_callback_entry *n, *__cbq; unsigned int size; @@ -102,7 +102,7 @@ size = NLMSG_SPACE(sizeof(*msg) + msg->len); -skb = alloc_skb(size, GFP_ATOMIC); +skb = alloc_skb(size, gfp_mask); if (!skb) { printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size); return; @@ -119,11 +119,11 @@ #endif NETLINK_CB(skb).dst_groups = groups; - -uskb = skb_clone(skb, GFP_ATOMIC); -if (uskb) +#if 0 +uskb = skb_clone(skb, gfp_mask); +if (uskb && 0) netlink_unicast(dev->nls, uskb, 0, 0); - +#endif netlink_broadcast(dev->nls, skb, 0, groups, GFP_ATOMIC); return; @@ -158,7 +158,7 @@ } spin_unlock_bh(&dev->cbdev->queue_lock); -return found; +return (found)?0:-ENODEV; } /* @@ -181,7 +181,6 @@ "requested msg->len=%u[%u], nlh->nlmsg_len=%u, skb->len=%u.\n", msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)), nlh->nlmsg_len, skb->len); -kfree_skb(skb); return -EINVAL; } #if 0 @@ -215,17 +214,18 @@ skb->len, skb->data_len, skb->truesize, skb->protocol, skb_cloned(skb), skb_shared(skb)); #endif -while (skb->len >= NLMSG_SPACE(0)) { +if (skb->len >= NLMSG_SPACE(0)) { nlh = (struct nlmsghdr *)skb->data; + if (nlh->nlmsg_len < sizeof(struct cn_msg) || skb->len < nlh->nlmsg_len || nlh->nlmsg_len > CONNECTOR_MAX_MSG_SIZE) { -#if 0 +#if 1 printk(KERN_INFO "nlmsg_len=%u, sizeof(*nlh)=%u\n", nlh->nlmsg_len, sizeof(*nlh)); #endif kfree_skb(skb); -break; +goto out; } len = NLMSG_ALIGN(nlh->nlmsg_len); @@ -233,22 +233,11 @@ len = skb->len; err = __cn_rx_skb(skb, nlh); -if (err) { -#if 0 -if (err < 0 && (nlh->nlmsg_flags & NLM_F_ACK)) -netlink_ack(skb, nlh, -err); -#endif -break; -} else { -#if 0 -if (nlh->nlmsg_flags & NLM_F_ACK) -netlink_ack(skb, nlh, 0); -#endif -break; -} -skb_pull(skb, len); +if (err < 0) +kfree_skb(skb); } - + +out: kfree_skb(__skb); } @@ -310,7 +299,7 @@ m.ack = notify_event; memcpy(&m.id, id, sizeof(m.id)); -cn_netlink_send(&m, ctl->group); +cn_netlink_send(&m, ctl->group, GFP_ATOMIC); } } spin_unlock_bh(¬ify_lock); --- orig/include/linux/connector.h +++ mod/include/linux/connector.h @@ -148,7 +148,7 @@ int cn_add_callback(struct cb_id *, char *, void (* callback)(void *)); void cn_del_callback(struct cb_id *); -void cn_netlink_send(struct cn_msg *, u32); +void cn_netlink_send(struct cn_msg *, u32, int); int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb); void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector
My workarea was based on 2.6.12-rc1-mm4 plus Guilluame's patch. Your patch caused 5 out of 8 hunks failure at connector.c and one failure at connector.h. Could you generate a new patch based on my version? A tar file of complete source of drivers/connector would work also. :) Thanks! - jay Evgeniy Polyakov wrote: Could you give attached patch a try instead of previous one. It adds gfp mask into cn_netlink_send() call also. If you need updated CBUS sources, feel free to ask, I will send updated sources with Andrew's comments resolved too. I do not know exactly your connector version, so patch will probably be applied with fuzz. feel free to contact if it does not apply, I will send the whole sources. Thank you. * looking for [EMAIL PROTECTED]/connector--main--0--patch-38 to compare with * comparing to [EMAIL PROTECTED]/connector--main--0--patch-38 M connector.c M connector.h M cbus.c * modified files --- orig/drivers/connector/connector.c +++ mod/drivers/connector/connector.c @@ -70,7 +70,7 @@ * then it is new message. * */ -void cn_netlink_send(struct cn_msg *msg, u32 __groups) +void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask) { struct cn_callback_entry *n, *__cbq; unsigned int size; @@ -102,7 +102,7 @@ size = NLMSG_SPACE(sizeof(*msg) + msg->len); - skb = alloc_skb(size, GFP_ATOMIC); + skb = alloc_skb(size, gfp_mask); if (!skb) { printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size); return; @@ -119,11 +119,11 @@ #endif NETLINK_CB(skb).dst_groups = groups; - - uskb = skb_clone(skb, GFP_ATOMIC); - if (uskb) +#if 0 + uskb = skb_clone(skb, gfp_mask); + if (uskb && 0) netlink_unicast(dev->nls, uskb, 0, 0); - +#endif netlink_broadcast(dev->nls, skb, 0, groups, GFP_ATOMIC); return; @@ -158,7 +158,7 @@ } spin_unlock_bh(&dev->cbdev->queue_lock); - return found; + return (found)?0:-ENODEV; } /* @@ -181,7 +181,6 @@ "requested msg->len=%u[%u], nlh->nlmsg_len=%u, skb->len=%u.\n", msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)), nlh->nlmsg_len, skb->len); - kfree_skb(skb); return -EINVAL; } #if 0 @@ -215,17 +214,18 @@ skb->len, skb->data_len, skb->truesize, skb->protocol, skb_cloned(skb), skb_shared(skb)); #endif - while (skb->len >= NLMSG_SPACE(0)) { + if (skb->len >= NLMSG_SPACE(0)) { nlh = (struct nlmsghdr *)skb->data; + if (nlh->nlmsg_len < sizeof(struct cn_msg) || skb->len < nlh->nlmsg_len || nlh->nlmsg_len > CONNECTOR_MAX_MSG_SIZE) { -#if 0 +#if 1 printk(KERN_INFO "nlmsg_len=%u, sizeof(*nlh)=%u\n", nlh->nlmsg_len, sizeof(*nlh)); #endif kfree_skb(skb); - break; + goto out; } len = NLMSG_ALIGN(nlh->nlmsg_len); @@ -233,22 +233,11 @@ len = skb->len; err = __cn_rx_skb(skb, nlh); - if (err) { -#if 0 - if (err < 0 && (nlh->nlmsg_flags & NLM_F_ACK)) -netlink_ack(skb, nlh, -err); -#endif - break; - } else { -#if 0 - if (nlh->nlmsg_flags & NLM_F_ACK) -netlink_ack(skb, nlh, 0); -#endif - break; - } - skb_pull(skb, len); + if (err < 0) + kfree_skb(skb); } - + +out: kfree_skb(__skb); } @@ -310,7 +299,7 @@ m.ack = notify_event; memcpy(&m.id, id, sizeof(m.id)); - cn_netlink_send(&m, ctl->group); + cn_netlink_send(&m, ctl->group, GFP_ATOMIC); } } spin_unlock_bh(¬ify_lock); --- orig/include/linux/connector.h +++ mod/include/linux/connector.h @@ -148,7 +148,7 @@ int cn_add_callback(struct cb_id *, char *, void (* callback)(void *)); void cn_del_callback(struct cb_id *); -void cn_netlink_send(struct cn_msg *, u32); +void cn_netlink_send(struct cn_msg *, u32, int); int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb); void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector
BTW, when it happened last time, my program listening to the socket complained about duplicate messages received. Unmatched seq. Rcvd=1824062, expected=1824061 <=== Unmatched seq. Rcvd=1824062, expected=1824063 <=== Unmatched seq. Rcvd=1824348, expected=1824307 When my program received 1824062 while expecting 1824061 it adjusted itself to expect the next one being 1824063. But instead the message of 1824062 arrived the second time. - jay Jay Lan wrote: Hi Evgeniy, Should i be concerned about this bugcheck? I have seen this happening a number of times, all with the same signature in my testing. I ran a mix of AIM7, ubench, fork-test (continuously fork new processes), and another program reading from the fork connector socket. Thanks, - jay cqueue/1[656]: bugcheck! 0 [1] Modules linked in: nfs lockd sunrpc sg st sr_mod ipv6 usbcore Pid: 656, CPU 1, comm: cqueue/1 psr : 1010085a6010 ifs : 8289 ip : []Not tainted ip is at __kfree_skb+0x1b0/0x220 unat: pfs : 0289 rsc : 0003 rnat: bsps: pr : 9641 ldrs: ccv : fpsr: 0009804c8a70433f csd : ssd : b0 : a001005cee50 b6 : a001e7e0 b7 : a001003ae440 f6 : 0fffbc8c0 f7 : 0ffdaa200 f8 : 18000 f9 : 10002a000 f10 : 0fffcc8c0 f11 : 1003e r1 : a00100c0ec00 r2 : 4000 r3 : 4000 r8 : 0028 r9 : a001008eaac8 r10 : 0004 r11 : 0028 r12 : e0307a99fd60 r13 : e0307a998000 r14 : a00100887c00 r15 : a00100a24b18 r16 : a00100a22e18 r17 : r18 : a00100887bec r19 : a00100a9080f r20 : 3517 r21 : 000f r22 : 0034 r23 : 0034 r24 : a00100a90810 r25 : 3518 r26 : 3518 r27 : 0010085a6010 r28 : a00100a90811 r29 : 3519 r30 : r31 : a00100a24ae8 Call Trace: [] show_stack+0x80/0xa0 sp=e0307a99f920 bsp=e0307a999078 [] show_regs+0x840/0x880 sp=e0307a99faf0 bsp=e0307a999018 [] die+0x150/0x1c0 sp=e0307a99fb00 bsp=e0307a998fd0 [] die_if_kernel+0x40/0x60 sp=e0307a99fb00 bsp=e0307a998fa0 [] ia64_bad_break+0x300/0x380 sp=e0307a99fb00 bsp=e0307a998f78 [] ia64_leave_kernel+0x0/0x280 sp=e0307a99fb90 bsp=e0307a998f78 [] __kfree_skb+0x1b0/0x220 sp=e0307a99fd60 bsp=e0307a998f30 [] kfree_skb+0x50/0xa0 sp=e0307a99fd60 bsp=e0307a998f10 [] cn_queue_wrapper+0xe0/0x100 sp=e0307a99fd60 bsp=e0307a998ee8 [] worker_thread+0x3e0/0x520 sp=e0307a99fd60 bsp=e0307a998e60 [] kthread+0x290/0x300 sp=e0307a99fdd0 bsp=e0307a998e20 [] kernel_thread_helper+0xe0/0x100 sp=e0307a99fe30 bsp=e0307a998df0 [] start_kernel_thread+0x20/0x40 sp=e0307a99fe30 bsp=e0307a998df0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector
Hi Evgeniy, Should i be concerned about this bugcheck? I have seen this happening a number of times, all with the same signature in my testing. I ran a mix of AIM7, ubench, fork-test (continuously fork new processes), and another program reading from the fork connector socket. Thanks, - jay cqueue/1[656]: bugcheck! 0 [1] Modules linked in: nfs lockd sunrpc sg st sr_mod ipv6 usbcore Pid: 656, CPU 1, comm: cqueue/1 psr : 1010085a6010 ifs : 8289 ip : []Not tainted ip is at __kfree_skb+0x1b0/0x220 unat: pfs : 0289 rsc : 0003 rnat: bsps: pr : 9641 ldrs: ccv : fpsr: 0009804c8a70433f csd : ssd : b0 : a001005cee50 b6 : a001e7e0 b7 : a001003ae440 f6 : 0fffbc8c0 f7 : 0ffdaa200 f8 : 18000 f9 : 10002a000 f10 : 0fffcc8c0 f11 : 1003e r1 : a00100c0ec00 r2 : 4000 r3 : 4000 r8 : 0028 r9 : a001008eaac8 r10 : 0004 r11 : 0028 r12 : e0307a99fd60 r13 : e0307a998000 r14 : a00100887c00 r15 : a00100a24b18 r16 : a00100a22e18 r17 : r18 : a00100887bec r19 : a00100a9080f r20 : 3517 r21 : 000f r22 : 0034 r23 : 0034 r24 : a00100a90810 r25 : 3518 r26 : 3518 r27 : 0010085a6010 r28 : a00100a90811 r29 : 3519 r30 : r31 : a00100a24ae8 Call Trace: [] show_stack+0x80/0xa0 sp=e0307a99f920 bsp=e0307a999078 [] show_regs+0x840/0x880 sp=e0307a99faf0 bsp=e0307a999018 [] die+0x150/0x1c0 sp=e0307a99fb00 bsp=e0307a998fd0 [] die_if_kernel+0x40/0x60 sp=e0307a99fb00 bsp=e0307a998fa0 [] ia64_bad_break+0x300/0x380 sp=e0307a99fb00 bsp=e0307a998f78 [] ia64_leave_kernel+0x0/0x280 sp=e0307a99fb90 bsp=e0307a998f78 [] __kfree_skb+0x1b0/0x220 sp=e0307a99fd60 bsp=e0307a998f30 [] kfree_skb+0x50/0xa0 sp=e0307a99fd60 bsp=e0307a998f10 [] cn_queue_wrapper+0xe0/0x100 sp=e0307a99fd60 bsp=e0307a998ee8 [] worker_thread+0x3e0/0x520 sp=e0307a99fd60 bsp=e0307a998e60 [] kthread+0x290/0x300 sp=e0307a99fdd0 bsp=e0307a998e20 [] kernel_thread_helper+0xe0/0x100 sp=e0307a99fe30 bsp=e0307a998df0 [] start_kernel_thread+0x20/0x40 sp=e0307a99fe30 bsp=e0307a998df0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector
Andrew Morton wrote: Guillaume Thouvenin <[EMAIL PROTECTED]> wrote: This patch adds a fork connector in the do_fork() routine. ... The fork connector is used by the Enhanced Linux System Accounting project http://elsa.sourceforge.net Does it also meet all the in-kernel requirements for other accounting projects? If not, what else do those other projects need? Hi Andrew, As the discussion in this thread of the past few days showed, this patch intends to take care of process grouping, but not the accounting data collection. Besides my concern of possibility of data loss, this patch also provides CSA information to handle process grouping as it intends to do. I plan to run some testing to see percentage of data loss when system is under stress test, but improvement at CBUS as Evgeniy indicated should help! Please be advised that i still need an do_exit handling to save accounting data. But, it is a separate issue. Thanks, - jay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
The parent information ((ppid,pid) pair) is useful for process group aggregation while do_exit() hook is needed to save per task accounting data before the task data is disposed. Thanks, - jay dean gaudet wrote: On Tue, 29 Mar 2005, Jay Lan wrote: The fork_connector is not designed to solve accounting data collection problem. The accounting data collection must be done via a hook from do_exit(). by the time do_exit() occurs the parent may have disappeared... you do need to record something at fork() time so that you can account to the correct ancestor. an example of where this ancestry is useful would be the summation of all cpu time spent by children of apache, spamd, clamd, ... -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
Paul, The fork_connector is not designed to solve accounting data collection problem. The accounting data collection must be done via a hook from do_exit(). The acct_process() hook invokes do_acct_process() to write BSD accounting data to disk. CSA needs a similar hook off do_exit() to collect more accounting data and write to disk in different accounting records format. This part is not part of fork_connector. ELSA does not care how accounting data being written to disks. However, it needs the accounting data being reliably and accurately collected and written to disk by BSD and/or CSA and it needs the information to aggregate processes. It was never the fork_connector's intention to piggy back the data to the accounting file. Thanks, - jay Paul Jackson wrote: Evgeniy writes: Here forking connector module "exits" and can handle next fork() on the same CPU. Fine ... but it's not about what the fork_connector does. It's about getting the accounting data to disk, if I understand correctly. That is why it is very fast in "fast-path". I don't care how fast a tool is. I care how fast the job gets done. If a tool is only doing part of the job, then we can't decide whether to use that tool just based on how fast that part of the job gets done. The most expensive part is cn_netlink_send()/netlink_broadcast(), with CBUS it is deferred to the safe time, This is "safe time" for the immediate purpose of letting the forking process continue on its way. But the deferred work of buffering up the data and writing it to disk still needs to be done, pretty soon. When sizing a system to see how many users or jobs I can run on it at a time, I will have to include sufficient cpu, memory and disk i/o to handle getting this accounting data to disk, right? 2) Using a modified form of what BSD ACCOUNTING does now: - forking process appends single fork data to in-kernel buffer It is not as simple. It takes global locks several times, it access bunch of shared between CPU data. It calls ->stat() and ->write() which may sleep. Hmmm ... good points. The mechanisms in the kernel now (and for the last 25 years) to write out BSD ACCOUNTING data may not be numa friendly. Perhaps there should be a per-cpu 512 byte buffer, which can gather up 8 accounting records (64 bytes each) and only call the file system write once every 8 task exits. Or perhaps a per-node buffer, with a spinlock to serialize access by the CPUs on that node. Or perhaps per-node accounting files. Or something like that. Guillaume, Jay - do we (you ?) need to make classic BSD ACCOUNTING data collection numa friendly? Based on the various frustrated comments at the top of kernel/acct.c, this could be a non-trivial effort to get right. Maybe we need it, but can't afford it. And perhaps my proposed variable length records for supplementary accounting, such as from fork, need to allow for some way to pad out the rest of a buffer, when the next record won't fit entirely. That work is deferred and does not affect in-kernel processes. The accounting data collection cannot be deferred for long, perhaps just a few minutes. Not until the data hits the disk can we rest indefinitely. Unless, that is, I don't understand what problem is being solved here (quite possible ;). And why userspace fork connector should write data to the disk? I NEVER said it should. I am NOT trying to redesign fork_connector. Good grief ... how many times and ways do I have to say this ;)? I am asking what is the best tool for accounting data collection, which, if I understand correctly, does need to write to disk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
Paul Jackson wrote: Guillaume wrote: The goal of the fork connector is to inform a user space application that a fork occurs in the kernel. This information (cpu ID, parent PID and child PID) can be used by several user space applications. It's not only for accounting. Accounting and fork_connector are two different things and thus, fork_connector doesn't do the merge of any kinds of data (and it will never do). Yes - it is clear that the fork_connector does this - inform user space of fork information . I'm not saying that fork_connector should merge data; I'm observing that it doesn't, and that this would seem to serve the needs of accounting poorly. Paul, You probably can look at it this way: the accounting data being written out by BSD are per process data and the fork connector provides information needed to group processes into process aggregates. Thanks, - jay Out of curiosity, what are these 'several user space applications?' The only one I know of is this extension to bsd accounting to include capturing parent and child pid at fork. Probably you've mentioned some other uses of fork_connector before here, but I missed it. The relayfs is done, like Evgeniy said, for large amount of datas. So I think that it's not suitable for what we want to achieve with the fork connector. I never claimed that relayfs was appropriate for fork_connector. I'm not trying to tape a rock to Evgeniy's screwdriver. I'm saying that accounting looks like a nail to me, so let us see what rocks and hammers we have in our tool box. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
Evgeniy Polyakov wrote: On Tue, 22 Mar 2005 10:26:19 -0800 Ram <[EMAIL PROTECTED]> wrote: On Mon, 2005-03-21 at 23:07, Guillaume Thouvenin wrote: On Mon, 2005-03-21 at 12:52 -0800, Ram wrote: If a bunch of applications are listening for fork events, your patch allows any application to turn off the fork event notification? Is this the right behavior? Yes it is. The main management is done by application so, if several applications are listening for fork events you need to choose which one will turn off the fork connector. I want to keep this turn on/off mechanism simple but if it's needed I can manage the variable "cn_fork_enable" as a counter. Thus the callback could be something like: static void cn_fork_callback(void *data) { int start; struct cn_msg *msg = (struct cn_msg *)data; if (cn_already_initialized && (msg->len == sizeof(cn_fork_enable))) { memcpy(&start, msg->data, sizeof(cn_fork_enable)); if (start) cn_fork_enable++; else cn_fork_enable > 0 ? cn_fork_enable-- : 0; } } I think a better way is: Providing a different connector channel called the administrator channel which can be used only by a super-user, and gives you the ability to switch on or off any connector channel including the fork-connector channel. Only super-user can bind netlink socket to multicast group. For lack of better term I am using the word 'channel' to mean something that carries events of particular type through the connector-infrastructure. I still do not see why it is needed. Super-user can run ip command and turn network interface off not waiting while apache or named exits or unbind. You can turn off a network interface and turn it back on without closing a network application and it will continue to work. In theory I can create some kind of userspace registration mechanism, when userspace application reports it's pid to the connector, and then it sends data to the specified pids, but does not allow controlling from userspace. But I really do not think it is a good idea to permit non-priviledged userspace processes to know about deep kernel internals through connector's messages. I see this issue less a case of bad guys vs. good guys. I see it as various components doing system related work, but there is no mechanism of knowing who is on who is off by today's patch. A service listening to the fork connector can request to turn off cn_fork_enable on exit and inadquately affect other services/daemons listening to the same connector. It is not acceptable in my opinion. The idea of implementing fork connector enabling/disabling was so that the kernel does not waste time writing to the sockets if no application listening. If implementing such a feature costs more than it saves, maybe the fork connector should simply always send? Thanks, - jay RP What do you think about this implementation? Guillaume Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
Ram wrote: On Tue, 2005-03-22 at 11:22, Evgeniy Polyakov wrote: On Tue, 22 Mar 2005 10:26:19 -0800 Ram <[EMAIL PROTECTED]> wrote: On Mon, 2005-03-21 at 23:07, Guillaume Thouvenin wrote: On Mon, 2005-03-21 at 12:52 -0800, Ram wrote: If a bunch of applications are listening for fork events, your patch allows any application to turn off the fork event notification? Is this the right behavior? Yes it is. The main management is done by application so, if several applications are listening for fork events you need to choose which one will turn off the fork connector. I want to keep this turn on/off mechanism simple but if it's needed I can manage the variable "cn_fork_enable" as a counter. Thus the callback could be something like: static void cn_fork_callback(void *data) { int start; struct cn_msg *msg = (struct cn_msg *)data; if (cn_already_initialized && (msg->len == sizeof(cn_fork_enable))) { memcpy(&start, msg->data, sizeof(cn_fork_enable)); if (start) cn_fork_enable++; else cn_fork_enable > 0 ? cn_fork_enable-- : 0; } } I think a better way is: Providing a different connector channel called the administrator channel which can be used only by a super-user, and gives you the ability to switch on or off any connector channel including the fork-connector channel. Only super-user can bind netlink socket to multicast group. ok. I did not realize that. For lack of better term I am using the word 'channel' to mean something that carries events of particular type through the connector-infrastructure. I still do not see why it is needed. Super-user can run ip command and turn network interface off not waiting while apache or named exits or unbind. In theory I can create some kind of userspace registration mechanism, when userspace application reports it's pid to the connector, and then it sends data to the specified pids, but does not allow controlling from userspace. But I really do not think it is a good idea to permit non-priviledged userspace processes to know about deep kernel internals through connector's messages. Yes. non-priviledged userspace processes should not know any deep kernel internals through connector events. I think what I am driving at is, an application that is critically dependent on the fork-notification, suddenly stops receiving such notification because some other application has switched off the service without its notice. the reason I am concerned is I am planning to feed this fork-events to my in-kernel module. Side note: I would really like support for in-kernel listners through connector infrastructure. Listners in the kernel? Sound like a PAGG thing again! Guillaume's stuff was originally an in-kernel module. CSA/Job are kernel modules also. We all use hook management feature of PAGG, which offer event notification to other kernel components. Guillaume moved to connector approach and i am moving to that direction too because Andrew likes to see in-kernel modules moved to user space. It is kind of funny to see someone trying to get a notification through a connector and then feed it back to the kernel! This does not sound right to me! - jay RP RP What do you think about this implementation? Guillaume Evgeniy Polyakov Only failure makes us experts. -- Theo de Raadt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] fork_connector: add a fork connector
Guillaume Thouvenin wrote: On Mon, 2005-03-21 at 12:52 -0800, Ram wrote: If a bunch of applications are listening for fork events, your patch allows any application to turn off the fork event notification? Is this the right behavior? Yes it is. The main management is done by application so, if several applications are listening for fork events you need to choose which one will turn off the fork connector. It is not practical. One listener never know who else are listening to the fork connector. We also need to protect yet another global variable "cn_fork_enable". Also, in order to implement "cn_fork_enable" as a counter, we need some sort of registration to prevent an application from sending repeated "off" notification. One can only turn off its own switch. Thanks, - jay I want to keep this turn on/off mechanism simple but if it's needed I can manage the variable "cn_fork_enable" as a counter. Thus the callback could be something like: static void cn_fork_callback(void *data) { int start; struct cn_msg *msg = (struct cn_msg *)data; if (cn_already_initialized && (msg->len == sizeof(cn_fork_enable))) { memcpy(&start, msg->data, sizeof(cn_fork_enable)); if (start) cn_fork_enable++; else cn_fork_enable > 0 ? cn_fork_enable-- : 0; } } What do you think about this implementation? Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: I/O and Memory accounting...
I thought you planned to read from CSA pacct file? Well, while we are in discussion of whether to merge and replace BSD accounting with CSA accounting, your proposed change will provide you data on charater I/O in a BSD pacct file. I supposed you do not need to have seperate fields on character-read and character-written? CSA will provide the data separately. CSA writes the data to a CSA pacct file in a similar way as BSD on exit callback at do_acct_process(). The CSA's exit callback is implemented as a loadable module. The CSA project and code can be downloaded at http://oss.sgi.com/projects/csa. Cheers, - jay Guillaume Thouvenin wrote: Hello, In the ChangeLog-2.6.11 file I read that the enhanced I/O accounting data patch and the enhanced memory accounting data collection patch were added. It's cool but I don't see how this stuff is used because information is never dump in a file or send to an accounting application (or I miss something). Maybe we should update the ac_io in the "struct acct"? Thus, values will be dump in the accounting file. Maybe it could be something like: --- acct.c.orig 2005-03-09 14:17:07.0 +0100 +++ acct.c 2005-03-09 14:18:20.0 +0100 @@ -477,8 +477,8 @@ static void do_acct_process(long exitcod } vsize = vsize / 1024; ac.ac_mem = encode_comp_t(vsize); - ac.ac_io = encode_comp_t(0 /* current->io_usage */); /* %% */ - ac.ac_rw = encode_comp_t(ac.ac_io / 1024); + ac.ac_io = encode_comp_t(current->rchar + current->wchar); + ac.ac_rw = encode_comp_t(0); ac.ac_minflt = encode_comp_t(current->signal->min_flt + current->group_leader->min_flt); ac.ac_majflt = encode_comp_t(current->signal->maj_flt + For memory and read/write syscall we may add new fields. Best regards, Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa
The patch i propose is tiny, simple and straight forward. It touches only one file and leaves the CSA code in a configurable loadable module. It broke nobody's code and it does not need to redesign existing BSD kernel code and utilities. If we are to merge the code, there are some detailed discussion needed to happen on implementation deail. We are talking about supporting two different internal formats by one piece of code. We need to maintain BSD acct format because BSD utilities count on it. If we are to combine two formats into one, we then need to modify BSD utilies to understand the new format. How about the backwards compatibility? Now this is really an overkill. All i asked for was only adding a few lines to acct.c. Thanks, - jay Tim Schmielau wrote: On Wed, 2 Mar 2005, Guillaume Thouvenin wrote: Is it possible to merge BSD and CSA? I mean with CSA, there is a part that does per-process accounting. For exemple in the linux-2.6.9.acct_mm.patch the two functions update_mem_hiwater() and csa_update_integrals() update fields in the current (and parent) process. So maybe you can improve the BSD per-process accounting or maybe CSA can replace the BSD per-process accounting? Yes, that was also my preferred direction - make CSA able to also write BSD acct format, and replace the existing BSD accounting with CSA. However it seems this will still increase the amount of kernel code quite a bit. Sorry for not going into any details, I have to leave right now and will be offline for two weeks. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa
I did not look into the userspace commands supported in BSD accounting on the dependency on the format of /var/account/pacct file. The accounting exit hook allows BSD/CSA to save accounting data stored in task_struct to internally kept data structure and then writes to their respective accounting file. The format of the accounting output file of the two are different, with CSA being the superset of BSD and also with CSA carrying concept of grouping of processes. It would be an interesting project to merge BSD and CSA, but it is going to take some surgical work on both products, not an easy one. - jay Guillaume Thouvenin wrote: On Tue, 2005-03-01 at 10:06 -0800, Jay Lan wrote: Sorry I was not clear on my point. I was trying to point out that, an exit hook for BSD and CSA is essential to save accounting data before the data is gone. That can not be done with a netlink. So, my patch was to keep acct_process as a wrapper, which would then call do_exit_csa() for CSA and call do_acct_process for BSD. Is it possible to merge BSD and CSA? I mean with CSA, there is a part that does per-process accounting. For exemple in the linux-2.6.9.acct_mm.patch the two functions update_mem_hiwater() and csa_update_integrals() update fields in the current (and parent) process. So maybe you can improve the BSD per-process accounting or maybe CSA can replace the BSD per-process accounting? Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa
Sorry I was not clear on my point. I was trying to point out that, an exit hook for BSD and CSA is essential to save accounting data before the data is gone. That can not be done with a netlink. So, my patch was to keep acct_process as a wrapper, which would then call do_exit_csa() for CSA and call do_acct_process for BSD. Thanks, - jay Guillaume Thouvenin wrote: On Mon, 2005-02-28 at 10:56 -0800, Jay Lan wrote: The exit hook is essential for CSA to save off data before the data is gone, A netlink type of thing does not help. BSD is in the same situation. You can not replace the acct_process() call with a netlink. If ELSA is to use the enhanced accounting data, it needs the CSA eop handling at exit as well. Why replace the acct_process()? The problem here is to add a new hook in the do_fork() and you can use the BSD accounting hook acct_process() which is already in the exit() routine. We don't need to replace it with a netlink because today there are no user space applications that need it. Best regards, Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa
Hi Andrew, You asked: > > In other words: given that ELSA can do its thing via existing accounting > interfaces and a fork notifier, why does CSA need to add lots more kernel > code? And i explained: > Here are some codes from do_exit() starting line 813 (based on > 2.6.11-rc4-mm1): > > 813acct_update_integrals(tsk); > 814update_mem_hiwater(tsk); > 815group_dead = atomic_dec_and_test(&tsk->signal->live); > 816if (group_dead) { > 817del_timer_sync(&tsk->signal->real_timer); > 818acct_process(code); > 819} > 820exit_mm(tsk); > > The acct_process() is called to save off BSD accounting data at > line 818. The next statement at 820, tsk->mm is disposed and all > data saved at tsk->mm is gone, including memory hiwater marks > information saved at line 814. The complete tsk is disposed > before exit of do_exit() routine. I was hoping Guilluame could jump in himself... But, in a separate discussion thread, he wrote: >> (Jay asked) >> I spent some time trying to understand how ELSA save the per-process >> accounting data before the task_struct data structure gets disposed, >> but failed to find anything. My assumption would be that ELSA does >> not collection those data in the kernel itself? Instead, it would >> read the pacct file created by either BSD or CSA? > > Yes you're right, ELSA reads accounting file created by BSD or CSA. We > don't make accounting. I think that the communication problem is here Guilluame's reply should answer your question. ELSA does not collect accounting data in the kernel as BSD/CSA does. It uses accounting data collected by BSD/CSA. The exit hook is essential for CSA to save off data before the data is gone, A netlink type of thing does not help. BSD is in the same situation. You can not replace the acct_process() call with a netlink. If ELSA is to use the enhanced accounting data, it needs the CSA eop handling at exit as well. Thanks, - jay Guillaume Thouvenin wrote: On Thu, 2005-02-24 at 20:46 -0800, Andrew Morton wrote: Jay Lan <[EMAIL PROTECTED]> wrote: Since my idea of providing an accounting framework was considered 'overkill', here i submit a tiny patch just to allow CSA to handle end-of-process (eop) situation by saving off accounting data before a task_struct is disposed. This patch is to modify the acct_process() in acct.c, which is invoked from do_exit() to handle eop for BSD accounting. Now the acct_process() wrapper will also take care of CSA, if it has been loaded. If the CSA module has been loaded, a CSA routine will be invoked to construct a CSA job record and to write the record to the CSA accounting file. I really don't want to have to (and shouldn't need to) become an accounting person, but as there seems to be a communication problem somewhere.. Please, you guys are the subject matter experts. Put your heads together and come up with something. I completely agree with that and we will continue this conversation in private with Jay and all people involved to come up with an appropriate solution. Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Andrew Morton wrote: Jay Lan <[EMAIL PROTECTED]> wrote: Andrew Morton wrote: > Kaigai Kohei <[EMAIL PROTECTED]> wrote: > >>In my understanding, what Andrew Morton said is "If target functionality can >> implement in user space only, then we should not modify the kernel-tree". > > > fork, exec and exit upcalls sound pretty good to me. As long as > > a) they use the same common machinery and > > b) they are next-to-zero cost if something is listening on the netlink >socket but no accounting daemon is running. > > Question is: is this sufficient for CSA? Yes, fork, exec, and exit upcalls are sufficient for CSA. The framework i proposed earlier should satisfy your requirement a and b, and provides upcalls needed by BSD, ELSA and CSA. Maybe i misunderstood your concern of the 'very light weight' framework i proposed besides being "overkill"? "upcall" is poorly defined. What I meant was that ELSA can perform its function when the kernel merely sends asynchronous notifications of forks out to userspace via netlink. Further, I'm wondering if CSA can perform its function with the same level of kernel support, perhaps with the addition of netlink-based notification of exec and exit as well. The framework patch which you sent was designed to permit the addition of more kernel accounting code, which is heading in the opposite direction. In other words: given that ELSA can do its thing via existing accounting interfaces and a fork notifier, why does CSA need to add lots more kernel code? Here are some codes from do_exit() starting line 813 (based on 2.6.11-rc4-mm1): 813acct_update_integrals(tsk); 814update_mem_hiwater(tsk); 815group_dead = atomic_dec_and_test(&tsk->signal->live); 816if (group_dead) { 817del_timer_sync(&tsk->signal->real_timer); 818acct_process(code); 819} 820exit_mm(tsk); The acct_process() is called to save off BSD accounting data at line 818. The next statement at 820, tsk->mm is disposed and all data saved at tsk->mm is gone, including memory hiwater marks information saved at line 814. The complete tsk is disposed before exit of do_exit() routine. In separate emails discussion thread among interested parties, i asked Guillaume to clarify this question. I suspect ELSA counts on BSD's acct_process() at line 818 to save most accounting data. If that is the case and since ELSA wants extended accounting data collection, a way to save the extended acct data would be essential to ELSA as well. I can better asnwer your "why ELSA can do but CSA can't" question after i learn more from Guilluame. Later, - jay --- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click ___ Lse-tech mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/lse-tech - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Chris Wright wrote: * Jay Lan ([EMAIL PROTECTED]) wrote: Andrew Morton wrote: Kaigai Kohei <[EMAIL PROTECTED]> wrote: In my understanding, what Andrew Morton said is "If target functionality can implement in user space only, then we should not modify the kernel-tree". fork, exec and exit upcalls sound pretty good to me. As long as a) they use the same common machinery and b) they are next-to-zero cost if something is listening on the netlink socket but no accounting daemon is running. Question is: is this sufficient for CSA? Yes, fork, exec, and exit upcalls are sufficient for CSA. As soon as you want to throttle tasks at the Job level, this would be insufficient. But, IIRC, that's not one of PAGG/Job/CSA's requirements right? PAGG serves more than JOB+CSA. I am looking into possiblity/feasibility of implementing JOB at userspace. However, even with JOB as a kernel module, the fork, exec and exit upcalls would be sufficient to support JOB+CSA. Thanks, - jay thanks, -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Andrew Morton wrote: Kaigai Kohei <[EMAIL PROTECTED]> wrote: In my understanding, what Andrew Morton said is "If target functionality can implement in user space only, then we should not modify the kernel-tree". fork, exec and exit upcalls sound pretty good to me. As long as a) they use the same common machinery and b) they are next-to-zero cost if something is listening on the netlink socket but no accounting daemon is running. Question is: is this sufficient for CSA? Yes, fork, exec, and exit upcalls are sufficient for CSA. The framework i proposed earlier should satisfy your requirement a and b, and provides upcalls needed by BSD, ELSA and CSA. Maybe i misunderstood your concern of the 'very light weight' framework i proposed besides being "overkill"? - jay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa
Since my idea of providing an accounting framework was considered 'overkill', here i submit a tiny patch just to allow CSA to handle end-of-process (eop) situation by saving off accounting data before a task_struct is disposed. This patch is to modify the acct_process() in acct.c, which is invoked from do_exit() to handle eop for BSD accounting. Now the acct_process() wrapper will also take care of CSA, if it has been loaded. If the CSA module has been loaded, a CSA routine will be invoked to construct a CSA job record and to write the record to the CSA accounting file. This patch only touchs one file: kernel/acct.c. Signed-off-by: Jay Lan <[EMAIL PROTECTED]> Index: linux/kernel/acct.c === --- linux.orig/kernel/acct.c2005-02-24 15:55:05.519092861 -0800 +++ linux/kernel/acct.c 2005-02-24 16:33:56.381584083 -0800 @@ -73,6 +73,11 @@ int acct_parm[3] = {4, 2, 30}; /* * External references and all of the globals. */ + +/* do_exit hook used by CSA */ +void (*do_exit_csa)(int, struct task_struct *) = NULL; +EXPORT_SYMBOL_GPL(do_exit_csa); + static void do_acct_process(long, struct file *); /* @@ -504,12 +509,17 @@ static void do_acct_process(long exitcod } /* - * acct_process - now just a wrapper around do_acct_process + * acct_process - now just a wrapper around + * do_acct_process - for BSD accounting + * do_exit_csa - for CSA */ void acct_process(long exitcode) { struct file *file = NULL; + if (do_exit_csa != NULL) + do_exit_csa(exitcode, current); + /* * accelerate the common fastpath: */
Re: [Lse-tech] Re: A common layer for Accounting packages
Tim Schmielau wrote: On Tue, 22 Feb 2005, Andrew Morton wrote: We really want to avoid doing such stuff in-kernel if at all possible, of course. Is it not possible to implement the fork/exec/exit notifications to userspace so that a daemon can track the process relationships and perform aggregation based upon individual tasks' accounting? That's what one of the accounting systems is proposing doing, I believe. (In fact, why do we even need the notifications? /bin/ps can work this stuff out). I had started a proof of concept implementation that could reconstruct the whole process tree from userspace just from the BSD accounting currently in the kernel (+ the conceptual bug-fix that I misnamed "[RFC] "biological parent" pid"). This could do the whole job ID thing from userspace. Unfortunately, I haven't had time to work on it recently. Also, doing per-job accounting might actually be more lightweight than per-process accounting, so I'm not at all opposed to unifying CSA and BSD accounting into one mechanism that just writes different file formats. Thanks, Tim! After spending some time studying how ELSA works, it appeared to me that CSA still needs a hook for do_exit. Since people agreed that a complete framework was an overkill, i would be glad to submit another patch later just to provide a CSA exit-handling inside the acct_process(). Thanks, - jay A complete framework seems like overkill to me, too. Tim --- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click ___ Lse-tech mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/lse-tech - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Hi Paul, I think the microbenchmarking your link provides is irrelevant. Your link provides benchmarking of doing a fork. However, we are talking about inserting a callback routine in a fork and/or an exit. The overhead is a function call and time spent in the routine. The callback routine can be configured to "do {} while (0)" if a certain CONFIG flag is not set. Thanks, - jay Paul Jackson wrote: So, I think such a fork/execve/exit hooks is harmless now. I don't recall seeing any microbenchmarking of the impact on fork/exit of such hooks. You might find such a benchmark in lmbench, or at http://bulk.fefe.de/scalability/. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Kaigai Kohei wrote: Hi, Thanks for your comments. >> I think there are two issues about system accounting framework. >> >> Issue: 1) How to define the appropriate unit for accounting ? >> Current BSD-accountiong make a collection per process accounting >> information. >> CSA make additionally a collection per process-aggregation accounting. > > > The 'enhanced acct data collection' patches that were added to > 2-6-11-rc* tree still do collection of per process data. Hmm, I have not noticed this extension. But I made sure about it. The following your two patches implements enhanced data collection, didn't it? Yes! - ChangeLog for 2.6.11-rc1 [PATCH] enhanced I/O accounting data patch [PATCH] enhanced Memory accounting data collection Since making a collection per process accounting is unified to the stock kernel, I want to have a discussion about remaining half, "How to define the appropriate unit for accounting ?" We can agree that only per process-accounting is so rigid, I think. Then, process-aggregation should be provided in one way or another. [1] Is it necessary 'fork/exec/exit' event handling framework ? The common agreement for the method of dealing with process aggregation has not been constructed yet, I understood. And, we will not able to integrate each process aggregation model because of its diverseness. For example, a process which belong to JOB-A must not belong any other 'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can concurrently belong to BANK-B1 which is a child of BANK-B. And, there are other defferences: Whether a process not to belong to any process-aggregation is permitted or not ? Whether a process-aggregation should be inherited to child process or not ? (There is possibility not to be inherited in a rule-based process aggregation like CKRM) Guillaume answered this question, and i think a policy would work. Some process-aggregation model have own philosophy and implemantation, so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event handling framework to implement any kinds of process-aggregation. BSD needs an exit hook and ELSA needs a fork hook. I am still evaluating whether CSA can use the ELSA module. If CSA can use the ELSA module, CSA maybe would be fine with the fork hook. [2] What implementation should be adopted ? I think registerable hooks on fork/execve/exit is necessary, not only exit() hook. Because a rule or policy based process-aggregation model requirees to catch the transition of a process status. It might be enough to hook the exit() event only in process-accounting, but it's not kind for another customer. Thus, I recommend SGI's PAGG. In my understanding, the reason for not to include such a framework is that increase of unidentifiable (proprietary) modules is worried. If we code the hooks explicitly in the kernel, such as in acct.c, then the concern of unidentifiable modules should be taken care of. A registerable framework was my preference. But if that causes concern, it would be fine for me to do it explicit way. An example is for acct_process() to invoke do_acct_process(). That means whoever intends to use even an existing hook needs to present their cases, i guess. Thanks, - jay But, SI can divert LSM to implemente process-aggregation if they ignore the LSM's original purpose, for example. # I'm strongly opposed to such a movement as a SELinux's user :-) So, I think such a fork/execve/exit hooks is harmless now. Is this the time to unify it? Thanks. > CSA added those per-process data to per-aggregation ("job") data > structure at do_exit() time when a process termintes. > >> >> It is appropriate to make the fork-exit event handling framework for >> definition >> of the process-aggregation, such as PAGG. >> >> This system-accounting per process-aggregation is quite useful, >> thought I tried the SGI's implementation named 'job' in past days. >> >> >> Issue: 2) What items should be collected for accounting information ? >> BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of >> minor/major page faults. SGI's CSA collects VM/RSS size on exit time, >> Integral-VM/RSS, and amount of block-I/O additionally. > > > These data are now collected in 2.6.11-rc* code. Note that these data > are still per-process. > >> >> I think it's hard to implement the accounting-engine as a kernel loadable >> module using any kinds of framework. Because, we must put callback >> functions >> into all around the kernel for this purpose. >> >> Thus, I make a proposion as follows: >> We should separate the process-aggregation functionality and collecting >> accounting informations. > > > I totally agree with this! Actually that was what we have done. The data > collection part of code has been unified. > >> Something of framework to implement process-aggregation is necessary. >> And, making a collection of accounting information should be merged >> into BSD-accounting and im
Re: [Lse-tech] Re: A common layer for Accounting packages
Guillaume Thouvenin wrote: On Tue, 2005-02-22 at 23:20 -0800, Andrew Morton wrote: Kaigai Kohei <[EMAIL PROTECTED]> wrote: The common agreement for the method of dealing with process aggregation has not been constructed yet, I understood. And, we will not able to integrate each process aggregation model because of its diverseness. For example, a process which belong to JOB-A must not belong any other 'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can concurrently belong to BANK-B1 which is a child of BANK-B. And, there are other defferences: Whether a process not to belong to any process-aggregation is permitted or not ? Whether a process-aggregation should be inherited to child process or not ? (There is possibility not to be inherited in a rule-based process aggregation like CKRM) Some process-aggregation model have own philosophy and implemantation, so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event handling framework to implement any kinds of process-aggregation. I can add "policies". With ELSA, a process belongs to one or several groups and if a process is removed from one group, its children still belong to the group. Thus a good idea could be to associate a "philosophy" to a group. For exemple, when a group of processes is created it can be tagged as UNIQUE or SHARED. UNIQUE means that a process that belongs to it could not be added in another group by opposition to SHARED. It's not needed inside the kernel. This makes sense to me. CSA can use the UNIQUE policy to enforce its "can't escape from job container" philisophy. We really want to avoid doing such stuff in-kernel if at all possible, of course. Is it not possible to implement the fork/exec/exit notifications to userspace so that a daemon can track the process relationships and perform aggregation based upon individual tasks' accounting? That's what one of the accounting systems is proposing doing, I believe. It's what I'm proposing. The problem is to be alerted when a new process is created in order to add it in the correct group of processes if the parent belongs to one (or several) groups. The notification can be done with the fork connector patch. I am not quite comfortable of ELSA requesting a fork hook this way. How many hooks in the stock kernel that are related to accounting? Can anyone answer this question? I know of 'acct_process()' in exit.c used by the BSD accounting and ELSA is requesting a hook in fork. If people raise the same question again a few years later, how many people will still remember this ELSA hook? That was the reason i thought a central piece was a good idea. I would rather see the fork hook is coded in acct.c and then invokes a routine that handles what ELSA needs. If CSA would adopt the ELSA's daemon's approach, CSA may also need to use the fork hook. Actually the acct_process() was modified not long ago to become a wrapper, which then invokes do_acct_process() which is completely BSD specific. The fork hook can be the same. - jay (In fact, why do we even need the notifications? /bin/ps can work this stuff out). Yes it can but the risk is to lose some forks no? I think that /bin/ps is using the /proc interface. If we're polling the /proc to catch process creation we may lost some of them. With the fork connector we catch all forks and we can check that by using the sequence number (incremented by each fork) of the message. Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Kaigai Kohei wrote: Hello, everyone. Andrew Morton wrote: > Jay Lan <[EMAIL PROTECTED]> wrote: > >>Since the need of Linux system accounting has gone beyond what BSD >>accounting provides, i think it is a good idea to create a thin layer >>of common code for various accounting packages, such as BSD accounting, >>CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke >>a routine in the common code which would then invoke those accounting >>packages that register to the acct_common to handle do_exit situation. > > > This all seems to be heading in the wrong direction. Do we really want to > have lots of different system accounting packages all hooking into a > generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead > framework? > > Can't we get _one_ accounting system in there, get it right, avoid the > framework? I think there are two issues about system accounting framework. Issue: 1) How to define the appropriate unit for accounting ? Current BSD-accountiong make a collection per process accounting information. CSA make additionally a collection per process-aggregation accounting. The 'enhanced acct data collection' patches that were added to 2-6-11-rc* tree still do collection of per process data. CSA added those per-process data to per-aggregation ("job") data structure at do_exit() time when a process termintes. It is appropriate to make the fork-exit event handling framework for definition of the process-aggregation, such as PAGG. This system-accounting per process-aggregation is quite useful, thought I tried the SGI's implementation named 'job' in past days. Issue: 2) What items should be collected for accounting information ? BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of minor/major page faults. SGI's CSA collects VM/RSS size on exit time, Integral-VM/RSS, and amount of block-I/O additionally. These data are now collected in 2.6.11-rc* code. Note that these data are still per-process. I think it's hard to implement the accounting-engine as a kernel loadable module using any kinds of framework. Because, we must put callback functions into all around the kernel for this purpose. Thus, I make a proposion as follows: We should separate the process-aggregation functionality and collecting accounting informations. I totally agree with this! Actually that was what we have done. The data collection part of code has been unified. Something of framework to implement process-aggregation is necessary. And, making a collection of accounting information should be merged into BSD-accounting and implemented as a part of monolithic kernel as Guillaume said. This sounds good. I am interested in learning how ELSA saves off the per-process accounting data before the data got disposed. If that scheme works for CSA, we would be very happy to adopt the scheme. The current BSD scheme is very insufficient. The code is very BSD centric and it provides no way to handle process-aggregation. Thanks, - jay Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Guillaume Thouvenin wrote: On Fri, 2005-02-18 at 17:16 -0800, Andrew Morton wrote: Jay Lan <[EMAIL PROTECTED]> wrote: Since the need of Linux system accounting has gone beyond what BSD accounting provides, i think it is a good idea to create a thin layer of common code for various accounting packages, such as BSD accounting, CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke a routine in the common code which would then invoke those accounting packages that register to the acct_common to handle do_exit situation. This all seems to be heading in the wrong direction. Do we really want to have lots of different system accounting packages all hooking into a generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead framework? Can't we get _one_ accounting system in there, get it right, avoid the framework? Is it possible to just merge the BSD accounting and the CSA accounting by adding in the current BSD per-process accounting structure some missing fields like the mm integral provided by the CSA patch? Hi Guillaume, All raw data CSA needs already stored in task_struct of the process. ELSA is just a user of the accounting data. We need a hook in the do_fork() routine to manage group of processes, not to do accounting. I see at least three layers of functions in doing system accounting: data collection, handling of the raw data, and presentation of the data to users. We have merged the data collection part. :) Handling of the raw data seems done in ELSA by user spaced daemon and you are proposing to add a hook at fork time. I am interested in learning your approach. How ELSA adds per process accounting data to your grouping (banks) when a process exit? How do you save accounting data you need in task_struct before it is disposed? BSD handles that through acct_process() hook at do_exit(). CSA also depends on a hook at do_exit() to merge per-process data to per-job data. How does ELSA handle this without a need of a do_exit() hook? Thanks, - jay Guillaume --- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click ___ Lse-tech mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/lse-tech - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
A common layer for Accounting packages
It started with the need of CSA to handle end-of-process (eop) at do_exit() at exit.c. The hook at exit.c was BSD Accounting specific. Since the need of Linux system accounting has gone beyond what BSD accounting provides, i think it is a good idea to create a thin layer of common code for various accounting packages, such as BSD accounting, CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke a routine in the common code which would then invoke those accounting packages that register to the acct_common to handle do_exit situation. Here is the description of this acct_common patch: 1) two new files at include/linux/acct_common.h and kernel/acct_common.c 2) A new config flag CONFIG_ACCT_COMMON is created and CONFIG_BSD_PROCESS_ACCT and a future CSA config flag depend on it. I think it is a good idea to always have acct_common in the kernel; in that case, the new config flag may not be really necessary. I can go either way. 3) Accounting packages can register themselves to acct_common for callbacks. Only do_exit handling is defined now. BSD acct.c has been modified to register/unergister to acct_common. 4) The 'enhanced acct data collection' routines have been moved from acct.c to acct_common.c. Files used to #include were modified to #include . This patch was generated against 2.6.11-rc3-mm2. Signed-off-by: Jay Lan <[EMAIL PROTECTED]> Index: linux/kernel/fork.c === --- linux.orig/kernel/fork.c2005-02-17 19:24:54.388927143 -0800 +++ linux/kernel/fork.c 2005-02-18 09:59:14.179816325 -0800 @@ -40,7 +40,7 @@ #include #include #include -#include +#include #include #include Index: linux/fs/exec.c === --- linux.orig/fs/exec.c2005-02-18 05:45:19.868493728 -0800 +++ linux/fs/exec.c 2005-02-18 09:59:14.196418021 -0800 @@ -47,7 +47,7 @@ #include #include #include -#include +#include #include #include Index: linux/kernel/exit.c === --- linux.orig/kernel/exit.c2005-02-17 19:24:54.380138011 -0800 +++ linux/kernel/exit.c 2005-02-18 10:05:43.506174767 -0800 @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include #include #include @@ -814,7 +814,7 @@ fastcall NORET_TYPE void do_exit(long co group_dead = atomic_dec_and_test(&tsk->signal->live); if (group_dead) { del_timer_sync(&tsk->signal->real_timer); - acct_process(code); + acct_do_exit(code, tsk); } exit_mm(tsk); Index: linux/include/linux/acct_common.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux/include/linux/acct_common.h 2005-02-18 12:41:28.232626074 -0800 @@ -0,0 +1,83 @@ +/* + * Copyright (c) 2005 Silicon Graphics, Inc. + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Jay Lan <[EMAIL PROTECTED]> + */ + +/* + * include/linux/acct_common.h + * + * Common Accounting for Linux - Definitions + * + * 02/17/05 Jay Lan <[EMAIL PROTECTED]>. Initial creation of the header file. + * This header file contains the definitions needed to support + * registration/unregistration of various accounting agents to + * the kernel. It also contains prototypes of common routines. + * + */ + +#ifndef _LINUX_ACCT_COMMON_H +#define _LINUX_ACCT_COMMON_H + +#include +#include +#include + + +/* + * Various accounting modules ID's + */ +#define ACCT_BSD 0 +#define ACCT_CSA 1 +#define ACCT_ELSA 2 +#define LAST_ACCT_ID ACCT_ELSA +#define ACCT_PKG_COUNT (LAST_ACCT_ID+1) + +/* + * Used by accounting packages to define the callback functions into the + * packages. + * + * STRUCT MEMBERS: + * pkg_id: Accounting package ID. This should be set the package + * when register. + * do_exit: Function pointer to function used when do_exit() + * hook is needed. + * This is optional and may be set to NULL if it is + *
initialize_acct_integrals
Hi Andrew, The new "move-accounting-function-calls-out-of-critical-vm-code-paths" patch in 2.6.11-rc3-mm2 was different from the code i tested. In particular, it mistakenly dropped the accounting routine calls in fs/exec.c. The calls in do_execve() are needed to properly initialize accounting fields. Specifically, the tsk->acct_stimexpd needs to be initialized to tsk->stime. I have discussed this with Christoph Lameter and he gave me full blessings to bring the calls back. This initialize_acct_integrals patch was generated against 2.6.11-rc3-mm2 to fix the problem. Thanks! Signed-off-by: Jay Lan <[EMAIL PROTECTED]> Index: linux/fs/exec.c === --- linux.orig/fs/exec.c2005-02-17 19:24:45.785343748 -0800 +++ linux/fs/exec.c 2005-02-18 05:45:19.868493728 -0800 @@ -1193,6 +1193,8 @@ int do_execve(char * filename, /* execve success */ security_bprm_free(bprm); + acct_update_integrals(current); + update_mem_hiwater(current); kfree(bprm); return retval; }
Re: move-accounting-function-calls-out-of-critical-vm-code-paths.patch
Andrew Morton wrote: Christoph Lameter <[EMAIL PROTECTED]> wrote: I hope that Roland's changes for higher resolution of cputime would make that possible. But this is Jay's thing not mine. I just want to make sure that the CSA patches does not get in the way of our attempts to improve the performance of the page fault handler. In the discussions on linux-mm there was also some concern about adding these calls. Well your patch certainly cleans things up in there and would be a good thing to have as long as we can be sure that it doesn't break the accounting in some subtle way. Which implies that we need to see some additional accounting code, so we can verify that the base accumulation infrastructure is doing the expected thing. As well as an ack from the interested parties. Does anyone know what's happening with all the new accounting initiatives? I'm seeing no activity at all. Sorry guys! I have been away for three weeks on short term disability.:( I have tested Christoph's patch before the leave. It did work for CSA and showed performance improvement on certain configuration. CSA is currently implemented as a loadable module. I think ELSA is the same, right? The use of the enhanced accounting data collection code is not in the kernel tree. That was why Andrew did not see usage of the accounting patches. Should i propose to include the CSA module in the kernel then, Andrew? :) Cheers, - jay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/