Re: [patch] add kdump_after_notifier

2007-08-23 Thread Jay Lan
Vivek Goyal wrote:
> On Tue, Aug 21, 2007 at 06:18:31AM -0700, Jay Lan wrote:
> [..]
>>>>> Now user will be able to view all the die_chain users through sysfs and
>>>>> be able to modify the order in which these should run by modifying their
>>>>> priority. Hence all the RAS tools can co-exist.
>>>> This is my image of your proposal.
>>>>
>>>> - Print current order
>>>>
>>>> # cat /sys/class/misc/debug/panic_notifier_list
>>>> priority   name
>>>> 1  IPMI
>>>> 2  watchdog
>>>> 3  Kdb
>>>> 4  Kdump
>>>>
>>> I think Bernhard's suggestion looks better here. I noticed that 
>>> /sys/kernel/debug is already present. So how about following.
>>>
>>> /sys/kernel/debug/kdump/priority
>>> /sys/kernel/debug/kdb/priority
>>> /sys/kernel/debug/IPMI/priority
>> Why separate priority files is better than a central file?
>> At least i think you get a grand picture of priority being
>> defined for all parties with a central file?
>>
> 
> I thought of couple of reasons.
> - A very different syntax to modify the priority.
> - Separate directories allow easy future extensions in terms of more
>   files. For example, putting a small "description" file in each dir
>   where each registered user can specify what does it do.

The first can be easily resolved by providing a comment section in the
file with real examples. Users can simply uncomment a line to activate.
But future expansion is certainly is a good reason for this layout.

> 
> But I agree that a single file is good for consolidated view. As bernhard
> suggested, may be we should also implement a read only file where one
> will get a consolidated view.

Yep, this will help!

> 
>> What do we decide priority if more than one component has
>> the same priority value?
>>
> 
> I think first come first serve would be appropriate in this case instead of
> returning -EINVAL.

How does the kernel process the configuration files? By alphabetic order
of the filename? Either way, i think a clear failure/warning dmesg is
very important.

Thanks,
 - jay

> 
> Thanks
> Vivek
> 
> ___
> kexec mailing list
> [EMAIL PROTECTED]
> http://lists.infradead.org/mailman/listinfo/kexec

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] add kdump_after_notifier

2007-08-21 Thread Jay Lan
Vivek Goyal wrote:
> On Thu, Aug 16, 2007 at 06:26:35PM +0900, Takenori Nagano wrote:
>> Vivek Goyal wrote:
>>  > So for the time being I think we can put RAS tools on die notifier list
>>> and if it runs into issues we can always think of creating a separate list.
>>>
>>> Few things come to mind.
>>>
>>> - Why there is a separate panic_notifier_list? Can't it be merged with
>>>   die_chain? die_val already got one of the event type as PANIC. If there
>>>   are no specific reasons then we should merge the two lists. Registering
>>>   RAS tools on a single list is easier.
>> I think it is difficult, because die_chain is defined by each architecture.
>>
> 
> I think die_chain is arch independent definition (kernel/die_notifier.c)? 
> But anyway, to begin with it can be done only for panic_notifier.
> 
>>> - Modify Kdump to register on die_chain list. 
>>> - Modify Kdb to register on die_chain list.
>>> - Export all the registered members of die_chain through sysfs along with
>>>   their priorities. Priorities should be modifiable. Most likely one 
>>>   shall have to introduce additional field in struct notifier_block. This
>>>   field will be a string as an identifier of the user registerd. e.g
>>>   "Kdump", "Kdb" etc.
>>>
>>> Now user will be able to view all the die_chain users through sysfs and
>>> be able to modify the order in which these should run by modifying their
>>> priority. Hence all the RAS tools can co-exist.
>> This is my image of your proposal.
>>
>> - Print current order
>>
>> # cat /sys/class/misc/debug/panic_notifier_list
>> priority   name
>> 1  IPMI
>> 2  watchdog
>> 3  Kdb
>> 4  Kdump
>>
> 
> I think Bernhard's suggestion looks better here. I noticed that 
> /sys/kernel/debug is already present. So how about following.
> 
> /sys/kernel/debug/kdump/priority
> /sys/kernel/debug/kdb/priority
> /sys/kernel/debug/IPMI/priority

Why separate priority files is better than a central file?
At least i think you get a grand picture of priority being
defined for all parties with a central file?

What do we decide priority if more than one component has
the same priority value?

Thanks,
 - jay

> 
> I think at some point of time we shall have to create another file say
> description.
> 
> /sys/kernel/debug/IPMI/description
> 
> Which can tell what does this tool do? Other a user might not have any
> clue how to prioritize various things.
> 
> Thanks
> Vivek
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Performance Stats: Kernel patch

2007-06-06 Thread Jay Lan
Andrew Morton wrote:
> On Tue, 05 Jun 2007 14:43:50 + Maxim Uvarov <[EMAIL PROTECTED]> wrote:
> 
>> Patch makes available to the user the following
>> task and process performance statistics:
>>  * Involuntary Context Switches (task_struct->nivcsw)
>>  * Voluntary Context Switches (task_struct->nvcsw)
>> 
>> Statistics information is available from:
>>  1. taskstats interface (Documentation/accounting/)
>>  2. /proc/PID/status (task only).
>>
>> This data is useful for detecting hyperactivity
>> patterns between processes.
>> Signed-off-by: Maxim Uvarov <[EMAIL PROTECTED]>
> 
> Thanks.
> 
> Please retain suitable cc's when discussing a patch.  I keep on having to
> find your updates on the mailing list, and then adding all the same cc's
> when I reply.
> 
> Your patch seemed to be against some horridly prehistoric kernel, so I had
> a few things to fix up.
> 
> I disagree with the term "rates" for these things.  They are counters, not
> rates.  Adjustments are below.
> 
> We still need to think about whether we missed any other fields.

My initial cut of bringing in general accounting fields was to cover
the "struct acct" in linux/acct.h. It was done so that at least data
needed by BSD accounting is in "struct taskstats".

We need to poll other interested parties into this subject, or we can
add more fields later when users of other accounting fields decide
to take advantage of the taskstats interface.

Thanks,
 - jay


> 
> 
> diff -puN 
> Documentation/accounting/getdelays.c~taskstats-add-context-switch-counters-fix
>  Documentation/accounting/getdelays.c
> --- 
> a/Documentation/accounting/getdelays.c~taskstats-add-context-switch-counters-fix
> +++ a/Documentation/accounting/getdelays.c
> @@ -49,7 +49,7 @@ char name[100];
>  int dbg;
>  int print_delays;
>  int print_io_accounting;
> -int print_task_context_switch_rates;
> +int print_task_context_switch_counts;
>  __u64 stime, utime;
>  
>  #define PRINTF(fmt, arg...) {\
> @@ -205,7 +205,7 @@ void print_delayacct(struct taskstats *t
>  "count", "delay total", t->swapin_count, t->swapin_delay_total);
>  }
>  
> -void task_context_switch_rates(struct taskstats *t)
> +void task_context_switch_counts(struct taskstats *t)
>  {
>   printf("\n\nTask   %15s%15s\n"
>  "   %15lu%15lu\n",
> @@ -259,7 +259,7 @@ int main(int argc, char *argv[])
>   break;
>   case 'q':
>   printf("printing task/process context switch rates\n");
> - print_task_context_switch_rates = 1;
> + print_task_context_switch_counts = 1;
>   break;
>   case 'w':
>   logfile = strdup(optarg);
> @@ -402,8 +402,8 @@ int main(int argc, char *argv[])
>   print_delayacct((struct 
> taskstats *) NLA_DATA(na));
>   if (print_io_accounting)
>   print_ioacct((struct 
> taskstats *) NLA_DATA(na));
> - if 
> (print_task_context_switch_rates)
> - 
> task_context_switch_rates((struct taskstats *) NLA_DATA(na));
> + if 
> (print_task_context_switch_counts)
> + 
> task_context_switch_counts((struct taskstats *) NLA_DATA(na));
>   if (fd) {
>   if (write(fd, 
> NLA_DATA(na), na->nla_len) < 0) {
>   err(1,"write 
> error\n");
> diff -puN 
> Documentation/accounting/taskstats-struct.txt~taskstats-add-context-switch-counters-fix
>  Documentation/accounting/taskstats-struct.txt
> --- 
> a/Documentation/accounting/taskstats-struct.txt~taskstats-add-context-switch-counters-fix
> +++ a/Documentation/accounting/taskstats-struct.txt
> @@ -22,7 +22,7 @@ There are three different groups of fiel
>  /* Extended accounting fields end */
>  Their values are collected if CONFIG_TASK_XACCT is set.
>  
> -4) Per-task and per-thread context switch rates statistics
> +4) Per-task and per-thread context switch count statistics
>  
>  Future extension should add fields to the end of the taskstats struct, and
>  should not change the relative position of each field within the struct.
> diff -puN fs/proc/array.c~taskstats-add-context-switch-counters-fix 
> fs/proc/array.c
> --- a/fs/proc/array.c~taskstats-add-context-switch-counters-fix
> +++ a/fs/proc/array.c
> @@ -290,7 +290,9 @@ static inline char *task_cap(struct task
>   cap_t(p->cap_permitted),
>   cap_t(p->cap_effective));
>  }
> -static inline char *task_context_switch_rates(struct task_struct *p, char 

Re: [PATCH] Performance Stats: Kernel patch

2007-06-04 Thread Jay Lan
Andrew Morton wrote:
> On Wed, 30 May 2007 18:49:46 +
> Maxim Uvarov <[EMAIL PROTECTED]> wrote:
> 
>> Removed syscall counters from patch.
>>
>>
>>
>>
>> Patch makes available to the user the following
>> task and process performance statistics:
>>  * Involuntary Context Switches (task_struct->nivcsw)
>>  * Voluntary Context Switches (task_struct->nvcsw)
>> 
>> Statistics information is available from:
>> 1. taskstats interface (Documentation/accounting/)
>>  2. /proc/PID/status (task only).
>>
>> This data is useful for detecting hyperactivity
>> patterns between processes.
>> Signed-off-by: Maxim Uvarov <[EMAIL PROTECTED]>
>>   
> 
> A few little things:
> 
>> diff --git a/Documentation/accounting/getdelays.c 
>> b/Documentation/accounting/getdelays.c
>> index e9126e7..18d22ad 100644
>> --- a/Documentation/accounting/getdelays.c
>> +++ b/Documentation/accounting/getdelays.c
>> @@ -49,6 +49,7 @@ char name[100];
>>  int dbg;
>>  int print_delays;
>>  int print_io_accounting;
>> +int print_task_stats;
>>  __u64 stime, utime;
>>  
>>  #define PRINTF(fmt, arg...) {   \
>> @@ -187,7 +188,7 @@ void print_delayacct(struct taskstats *t)
>> "IO%15s%15s\n"
>> "  %15llu%15llu\n"
>> "MEM   %15s%15s\n"
>> -   "  %15llu%15llu\n\n",
>> +   "  %15llu%15llu\n"
>> "count", "real total", "virtual total", "delay total",
>> t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
>> t->cpu_delay_total,
>> @@ -196,6 +197,14 @@ void print_delayacct(struct taskstats *t)
>> "count", "delay total", t->swapin_count, t->swapin_delay_total);
>>  }
>>  
>> +void print_taskstats(struct taskstats *t)
>> +{
>> +printf("\n\nTask   %15s%15s\n"
>> +   "   %15lu%15lu\n",
>> +   "voluntary", "nonvoluntary",
>> +   t->nvcsw, t->nivcsw);
>> +}
> 
> print_task_stats versus print_taskstats is a bit confusing, but I guess it
> doesn't matter.
> 
> More significantly, the whole idea of calling it "task stats" isn't a good
> one: it's far too general.  The whole kernel interface is called taskstats,
> but the additions here are a tiny part of that.
> 
> Perhaps task_context_switch_rates would be more appropriate, although
> rather a lot to type.
> 
>>  void print_ioacct(struct taskstats *t)
>>  {
>>  printf("%s: read=%llu, write=%llu, cancelled_write=%llu\n",
>> @@ -227,7 +236,7 @@ int main(int argc, char *argv[])
>>  struct msgtemplate msg;
>>  
>>  while (1) {
>> -c = getopt(argc, argv, "diw:r:m:t:p:v:l");
>> +c = getopt(argc, argv, "qdiw:r:m:t:p:v:l");
>>  if (c < 0)
>>  break;
>>  
>> @@ -240,6 +249,10 @@ int main(int argc, char *argv[])
>>  printf("printing IO accounting\n");
>>  print_io_accounting = 1;
>>  break;
>> +case 'q':
>> +printf("printing task/process stasistics:\n");
>> +print_task_stats = 1;
>> +break;
>>  case 'w':
>>  strncpy(logfile, optarg, MAX_FILENAME);
>>  printf("write to file %s\n", logfile);
>> @@ -381,6 +394,8 @@ int main(int argc, char *argv[])
>>  print_delayacct((struct 
>> taskstats *) NLA_DATA(na));
>>  if (print_io_accounting)
>>  print_ioacct((struct 
>> taskstats *) NLA_DATA(na));
>> +if (print_task_stats)
>> +print_taskstats((struct 
>> taskstats *) NLA_DATA(na));
>>  if (fd) {
>>  if (write(fd, 
>> NLA_DATA(na), na->nla_len) < 0) {
>>  err(1,"write 
>> error\n");
>> diff --git a/Documentation/accounting/taskstats-struct.txt 
>> b/Documentation/accounting/taskstats-struct.txt
>> index 661c797..c3ae6a9 100644
>> --- a/Documentation/accounting/taskstats-struct.txt
>> +++ b/Documentation/accounting/taskstats-struct.txt
>> @@ -22,6 +22,8 @@ There are three different groups of fields in the struct 
>> taskstats:
>>  /* Extended accounting fields end */
>>  Their values are collected if CONFIG_TASK_XACCT is set.
>>  
>> +4) Per-task and per-thread statistics
>> +
>>  Future extension should add fields to the end of the taskstats struct, and
>>  should not change the relative position of each field within the struct.
>>  
>> @@ -158,4 +160,8 @@ struct taskstats {
>>  
>>  /* Extended accounting fields end */
>>  
>> +4) Per-task and per-thread statistics
>> +__u64   nvcsw;  /* Context vo

Re: [PATCH] Performance Stats: Kernel patch

2007-06-04 Thread Jay Lan
Add Jonathan Lim to cc, who is working on CSA userland implementation
to use the taskstats data that this patch is going to remove.

Thanks,
 - jay



Andrew Morton wrote:
> On Wed, 30 May 2007 18:49:46 +
> Maxim Uvarov <[EMAIL PROTECTED]> wrote:
> 
>> Removed syscall counters from patch.
>>
>>
>>
>>
>> Patch makes available to the user the following
>> task and process performance statistics:
>>  * Involuntary Context Switches (task_struct->nivcsw)
>>  * Voluntary Context Switches (task_struct->nvcsw)
>> 
>> Statistics information is available from:
>> 1. taskstats interface (Documentation/accounting/)
>>  2. /proc/PID/status (task only).
>>
>> This data is useful for detecting hyperactivity
>> patterns between processes.
>> Signed-off-by: Maxim Uvarov <[EMAIL PROTECTED]>
>>   
> 
> A few little things:
> 
>> diff --git a/Documentation/accounting/getdelays.c 
>> b/Documentation/accounting/getdelays.c
>> index e9126e7..18d22ad 100644
>> --- a/Documentation/accounting/getdelays.c
>> +++ b/Documentation/accounting/getdelays.c
>> @@ -49,6 +49,7 @@ char name[100];
>>  int dbg;
>>  int print_delays;
>>  int print_io_accounting;
>> +int print_task_stats;
>>  __u64 stime, utime;
>>  
>>  #define PRINTF(fmt, arg...) {   \
>> @@ -187,7 +188,7 @@ void print_delayacct(struct taskstats *t)
>> "IO%15s%15s\n"
>> "  %15llu%15llu\n"
>> "MEM   %15s%15s\n"
>> -   "  %15llu%15llu\n\n",
>> +   "  %15llu%15llu\n"
>> "count", "real total", "virtual total", "delay total",
>> t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
>> t->cpu_delay_total,
>> @@ -196,6 +197,14 @@ void print_delayacct(struct taskstats *t)
>> "count", "delay total", t->swapin_count, t->swapin_delay_total);
>>  }
>>  
>> +void print_taskstats(struct taskstats *t)
>> +{
>> +printf("\n\nTask   %15s%15s\n"
>> +   "   %15lu%15lu\n",
>> +   "voluntary", "nonvoluntary",
>> +   t->nvcsw, t->nivcsw);
>> +}
> 
> print_task_stats versus print_taskstats is a bit confusing, but I guess it
> doesn't matter.
> 
> More significantly, the whole idea of calling it "task stats" isn't a good
> one: it's far too general.  The whole kernel interface is called taskstats,
> but the additions here are a tiny part of that.
> 
> Perhaps task_context_switch_rates would be more appropriate, although
> rather a lot to type.
> 
>>  void print_ioacct(struct taskstats *t)
>>  {
>>  printf("%s: read=%llu, write=%llu, cancelled_write=%llu\n",
>> @@ -227,7 +236,7 @@ int main(int argc, char *argv[])
>>  struct msgtemplate msg;
>>  
>>  while (1) {
>> -c = getopt(argc, argv, "diw:r:m:t:p:v:l");
>> +c = getopt(argc, argv, "qdiw:r:m:t:p:v:l");
>>  if (c < 0)
>>  break;
>>  
>> @@ -240,6 +249,10 @@ int main(int argc, char *argv[])
>>  printf("printing IO accounting\n");
>>  print_io_accounting = 1;
>>  break;
>> +case 'q':
>> +printf("printing task/process stasistics:\n");
>> +print_task_stats = 1;
>> +break;
>>  case 'w':
>>  strncpy(logfile, optarg, MAX_FILENAME);
>>  printf("write to file %s\n", logfile);
>> @@ -381,6 +394,8 @@ int main(int argc, char *argv[])
>>  print_delayacct((struct 
>> taskstats *) NLA_DATA(na));
>>  if (print_io_accounting)
>>  print_ioacct((struct 
>> taskstats *) NLA_DATA(na));
>> +if (print_task_stats)
>> +print_taskstats((struct 
>> taskstats *) NLA_DATA(na));
>>  if (fd) {
>>  if (write(fd, 
>> NLA_DATA(na), na->nla_len) < 0) {
>>  err(1,"write 
>> error\n");
>> diff --git a/Documentation/accounting/taskstats-struct.txt 
>> b/Documentation/accounting/taskstats-struct.txt
>> index 661c797..c3ae6a9 100644
>> --- a/Documentation/accounting/taskstats-struct.txt
>> +++ b/Documentation/accounting/taskstats-struct.txt
>> @@ -22,6 +22,8 @@ There are three different groups of fields in the struct 
>> taskstats:
>>  /* Extended accounting fields end */
>>  Their values are collected if CONFIG_TASK_XACCT is set.
>>  
>> +4) Per-task and per-thread statistics
>> +
>>  Future extension should add fields to the end of the taskstats struct, and
>>  should not change the relative position of each field within the struct.
>>  
>> @@ -158,4 +160,8 @@ struct taskstats {

Re: [Fastboot] [PATCH 1/1] - platform_kernel_launch_event is noop on generic kernel

2007-03-05 Thread Jay Lan
Bernhard Walle wrote:
> * Bernhard Walle <[EMAIL PROTECTED]> [2007-03-01 13:57]:
>> * Horms <[EMAIL PROTECTED]> [2007-03-01 05:22]:
>>> On Wed, Feb 28, 2007 at 01:45:17PM -0600, John Keller wrote:
>>>> Add a missing #define for the platform_kernel_launch_event.
>>>> Without this fix, a call to platform_kernel_launch_event()
>>>> becomes a noop on generic kernels. SN systems require this
>>>> fix to successfully kdump/kexec from certain hardware errors.
>>>>
>>>> Signed-off-by: John Keller <[EMAIL PROTECTED]>
>>> I made a similar change when porting to xen, but I hadn't thought
>>> to see if mainline linux needs it to.
>>>
>>> Acked-by: Simon Horman <[EMAIL PROTECTED]>
>> I think there's an additional change needed. Without that, it leads to
>> a NULL pointer dereference.

Yep, it is needed. The missing line was in Jack's original patch
but was lost somehow in 2.6.20.

Acked-by: Jay Lan <[EMAIL PROTECTED]>

> 
> Maybe I should also get my editor right ... ;):
> 
> ---
>  include/asm-ia64/machvec.h |2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: b/include/asm-ia64/machvec.h
> ===
> --- a/include/asm-ia64/machvec.h
> +++ b/include/asm-ia64/machvec.h
> @@ -168,6 +168,7 @@ extern void machvec_tlb_migrate_finish (
>  #  define platform_setup_msi_irq ia64_mv.setup_msi_irq
>  #  define platform_teardown_msi_irq  ia64_mv.teardown_msi_irq
>  #  define platform_pci_fixup_bus ia64_mv.pci_fixup_bus
> +#  define platform_kernel_launch_event   ia64_mv.kernel_launch_event
>  # endif
>  
>  /* __attribute__((__aligned__(16))) is required to make size of the
> @@ -269,6 +270,7 @@ struct ia64_machine_vector {
>   platform_setup_msi_irq, \
>   platform_teardown_msi_irq,  \
>   platform_pci_fixup_bus, \
> + platform_kernel_launch_event\
>  }
>  
>  extern struct ia64_machine_vector ia64_mv;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kexec: Fix CONFIG_SMP=n compilation V2 (ia64)

2007-02-05 Thread Jay Lan
Magnus Damm wrote:
> kexec: Fix CONFIG_SMP=n compilation V2 (ia64)
> 
> Kexec support for 2.6.20 on ia64 does not build properly using a config
> made up by CONFIG_SMP=n and CONFIG_HOTPLUG_CPU=n:
> 
>   CC  arch/ia64/kernel/machine_kexec.o
> arch/ia64/kernel/machine_kexec.c: In function `machine_shutdown':
> arch/ia64/kernel/machine_kexec.c:77: warning: implicit declaration of 
> function `cpu_down'
>   AS  arch/ia64/kernel/relocate_kernel.o
>   CC  arch/ia64/kernel/crash.o
> arch/ia64/kernel/crash.c: In function `kdump_cpu_freeze':
> arch/ia64/kernel/crash.c:139: warning: implicit declaration of function 
> `ia64_jump_to_sal'
> arch/ia64/kernel/crash.c:139: error: `sal_boot_rendez_state' undeclared 
> (first use in this function)
> arch/ia64/kernel/crash.c:139: error: (Each undeclared identifier is reported 
> only once
> arch/ia64/kernel/crash.c:139: error: for each function it appears in.)
> arch/ia64/kernel/crash.c: At top level:
> arch/ia64/kernel/crash.c:84: warning: 'kdump_wait_cpu_freeze' defined but not 
> used
> make[1]: *** [arch/ia64/kernel/crash.o] Error 1
> make: *** [arch/ia64/kernel] Error 2
> 
> Signed-off-by: Magnus Damm <[EMAIL PROTECTED]>
> ---

Looks good to me also.

Acked-by: Jay Lan <[EMAIL PROTECTED]>

> machine_kexec.c
>  Applies on top of 2.6.20.
> 
>  arch/ia64/kernel/crash.c |   11 +++
>  arch/ia64/kernel/machine_kexec.c |2 ++
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> --- 0002/arch/ia64/kernel/crash.c
> +++ work/arch/ia64/kernel/crash.c 2007-02-02 20:31:18.0 +0900
> @@ -79,6 +79,7 @@ crash_save_this_cpu()
>   final_note(buf);
>  }
>  
> +#ifdef CONFIG_SMP
>  static int
>  kdump_wait_cpu_freeze(void)
>  {
> @@ -91,6 +92,7 @@ kdump_wait_cpu_freeze(void)
>   }
>   return 1;
>  }
> +#endif
>  
>  void
>  machine_crash_shutdown(struct pt_regs *pt)
> @@ -132,11 +134,12 @@ kdump_cpu_freeze(struct unw_frame_info *
>   atomic_inc(&kdump_cpu_freezed);
>   kdump_status[cpuid] = 1;
>   mb();
> - if (cpuid == 0) {
> - for (;;)
> - cpu_relax();
> - } else
> +#ifdef CONFIG_HOTPLUG_CPU
> + if (cpuid != 0)
>   ia64_jump_to_sal(&sal_boot_rendez_state[cpuid]);
> +#endif
> + for (;;)
> + cpu_relax();
>  }
>  
>  static int
> --- 0002/arch/ia64/kernel/machine_kexec.c
> +++ work/arch/ia64/kernel/machine_kexec.c 2007-02-02 20:56:24.0 
> +0900
> @@ -70,12 +70,14 @@ void machine_kexec_cleanup(struct kimage
>  
>  void machine_shutdown(void)
>  {
> +#ifdef CONFIG_HOTPLUG_CPU
>   int cpu;
>  
>   for_each_online_cpu(cpu) {
>   if (cpu != smp_processor_id())
>   cpu_down(cpu);
>   }
> +#endif
>   kexec_disable_iosapic();
>  }
>  
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fastboot] [PATCH] kexec: Fix CONFIG_SMP=n compilation (ia64)

2007-02-02 Thread Jay Lan
Horms wrote:
> On Fri, Feb 02, 2007 at 11:01:21AM +0800, Zou, Nanhai wrote:
>  void
>  machine_crash_shutdown(struct pt_regs *pt)
> @@ -132,11 +134,12 @@ kdump_cpu_freeze(struct unw_frame_info *
>   atomic_inc(&kdump_cpu_freezed);
>   kdump_status[cpuid] = 1;
>   mb();
> - if (cpuid == 0) {
> - for (;;)
> - cpu_relax();
> - } else
> +#ifdef CONFIG_HOTPLUG_CPU
> + if (cpuid != 0)
>   ia64_jump_to_sal(&sal_boot_rendez_state[cpuid]);
> +#endif
> + for (;;)
> + cpu_relax();
>  }
 I trust ia64_jump_to_sal doesn't return.
>>> So do I. The main problem with the compilation seems to be that
>>> ia64_jump_to_sal() only exists if CONFIG_HOTPLUG_CPU=y.
>>> (include/asm-ia64/sal.h, arch/ia64/kernel/head.S)
>>>
>>  This may cause problem on SN platform. 
>> I remember SN requires cpu0 return to SAL rendez loop to do IRQ redirection.
>> However this needs SGI people to confirm...

SN needs slave cpus being returned to SAL rendez loop with the exception
of cpu0. It seems there is not a decent way to return cpu0, so we
decided to handle cpu0 in PROM. This above code change does not casue
problem on SN platform.

> 
> Does this mean that CONFIG_HOTPLUG_CPU may be required for kdump
> on the SN platform?

The SN platform uses the ia64_jump_to_sal() routine.

Thanks,
 - jay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fastboot] [PATCH] Kdump documentation update for 2.6.20: ia64 portion

2007-01-12 Thread Jay Lan
Horms wrote:
> Hi,
> 
> this patch fills in the portions for ia64 kexec.
> 
> I'm actually not sure what options are required for the dump-capture
> kernel, but "init 1 irqpoll maxcpus=1" has been working fine for me.
> Or more to the point, I'm not sure if irqpoll is needed or not.
> 
> This patch requires the documentation patch update that Vivek Goyal has
> been circulating, and I believe is currently in mm. Feel free to fold it
> into that change if it makes things easier for anyone.
> 
> Take II
> 
> Nanhai,
> 
> I have noted that vmlinux.gz may also be used. And added a note about the
> kernel being able to automatically place the crashkernel region.
> Furthermore, I added a note that if manually specified, the region should
> be 64Mb aligned to avoid wastage. I notice that the auto placement code
> uses 64Mb. But is this strictly neccessary for all page sizes?
> 
> Signed-off-by: Simon Horman <[EMAIL PROTECTED]>
> 
> Index: linux-2.6/Documentation/kdump/kdump.txt
> ===
> --- linux-2.6.orig/Documentation/kdump/kdump.txt  2007-01-12 
> 17:45:19.0 +0900
> +++ linux-2.6/Documentation/kdump/kdump.txt   2007-01-12 17:59:42.0 
> +0900
> @@ -17,7 +17,7 @@
>  memory image to a dump file on the local disk, or across the network to
>  a remote system.
>  
> -Kdump and kexec are currently supported on the x86, x86_64, ppc64 and IA64
> +Kdump and kexec are currently supported on the x86, x86_64, ppc64 and ia64
>  architectures.
>  
>  When the system kernel boots, it reserves a small section of memory for
> @@ -229,7 +229,23 @@
>  
>  Dump-capture kernel config options (Arch Dependent, ia64)
>  --
> -(To be filled)
> +
> +- No specific options are required to create a dump-capture kernel
> +  for ia64, other than those specified in the arch idependent section
> +  above. This means that it is possible to use the system kernel
> +  as a dump-capture kernel if desired.
> +  
> +  The crashkernel region can be automatically placed by the system
> +  kernel at run time. This is done by specifying the base address as 0,
> +  or omitting it all together.

In my testing, i found the base address was ignored. Whatever value
specified was fine. Not necessary to be 0. But i guess it is fine to
give people a guideline telling them to specify 0.

Cheers,
 - jay


> +
> +  [EMAIL PROTECTED]
> +  or
> +  crashkernel=256M
> +
> +  If the start address is specified, not that the start address of the
> +  kernel will be alligned to 64Mb, so any if the start address is not then
> +  any space below the alignment point will be wasted.
>  
>  
>  Boot into System Kernel
> @@ -248,6 +264,10 @@
>  
> On ppc64, use "[EMAIL PROTECTED]".
>  
> +   On ia64, [EMAIL PROTECTED] is a generous value that typically works.
> +   The region may be automatically placed on ia64, see the
> +   dump-capture kernel config option notes above.
> +
>  Load the Dump-capture Kernel
>  
>  
> @@ -266,7 +286,8 @@
>  For ppc64:
>   - Use vmlinux
>  For ia64:
> - (To be filled)
> + - Use vmlinux or vmlinuz.gz
> +
>  
>  If you are using a uncompressed vmlinux image then use following command
>  to load dump-capture kernel.
> @@ -282,18 +303,19 @@
> --initrd= \
> --append="root= "
>  
> +Please note, that --args-linux does not need to be specified for ia64.
> +It is planned to make this a no-op on that architecture, but for now
> +it should be omitted
> +
>  Following are the arch specific command line options to be used while
>  loading dump-capture kernel.
>  
> -For i386 and x86_64:
> +For i386, x86_64 and ia64:
>   "init 1 irqpoll maxcpus=1"
>  
>  For ppc64:
>   "init 1 maxcpus=1 noirqdistrib"
>  
> -For IA64
> - (To be filled)
> -
>  
>  Notes on loading the dump-capture kernel:
>  
> ___
> fastboot mailing list
> fastboot@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/fastboot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] largefile support for accounting

2005-08-11 Thread Jay Lan

Yes, our CSA customers reported same problem. CSA already incoporated
same fix to our kernel module code. :)

Cheers,
 - jay


Peter Staubach wrote:

Hi.

There is a problem in the accounting subsystem in the kernel can not
correctly handle files larger than 2GB.  The output file containing
the process accounting data can grow very large if the system is large
enough and active enough.  If the 2GB limit is reached, then the system
simply stops storing process accounting data.

Another annoying problem is that once the system reaches this 2GB limit,
then every process which exits will receive a signal, SIGXFSZ.  This
signal is generated because an attempt was made to write beyond the limit
for the file descriptor.  This signal makes it look like every process
has exited due to a signal, when in fact, they have not.

The solution is to add the O_LARGEFILE flag to the list of flags used
to open the accounting file.  The rest of the accounting support is
already largefile safe.

The changes were tested by constructing a large file (just short of 2GB),
enabling accounting, and then running enough commands to cause the
accounting data generated to increase the size of the file to 2GB.  Without
the changes, the file grows to 2GB and the last command run in the test
script appears to exit due a signal when it has not.  With the changes,
things work as expected and quietly.

There are some user level changes required so that it can deal with
largefiles, but those are being handled separately.

Signed-off-by: Peter Staubach <[EMAIL PROTECTED]>




--- linux-2.6.12/kernel/acct.c.org  2005-06-17 15:48:29.0 -0400
+++ linux-2.6.12/kernel/acct.c  2005-08-10 15:12:46.0 -0400
@@ -220,7 +220,7 @@ asmlinkage long sys_acct(const char __us
return (PTR_ERR(tmp));
}
/* Difference from BSD - they don't do O_APPEND */
-   file = filp_open(tmp, O_WRONLY|O_APPEND, 0);
+   file = filp_open(tmp, O_WRONLY | O_APPEND | O_LARGEFILE, 0);
putname(tmp);
if (IS_ERR(file)) {
return (PTR_ERR(file));


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector

2005-04-10 Thread Jay Lan
I based my listen program on the fclisten.c posted by Kaigai Kohei
with my own modification. Unfortunately i lost my test machine in the
lab. I will recreate the listen program Monday. The original listener
did not validate sequence number. It also prints length of data and
sequence number of every message it receives. My listener prints
only out-of-sequence error messages.
The fork generator fork-test.c was yours? I called it fork-test
and let it run continuously in while-loop:
# while 1
# ./fork-test 1000
# sleep 1
# end
I let it do 10,000,000 times of fork continuously while the system
is running AIM7 and/or ubench.
The original fclisten.c and fork-test.c are attached for your reference.
- jay
Evgeniy Polyakov wrote:
On Fri, 08 Apr 2005 20:31:20 -0700
Jay Lan <[EMAIL PROTECTED]> wrote:

With the patch you provide to me, i did not see the bugcheck
at cn_queue_wrapper() at the console.
Unmatched sequence number messages still happened. We expect
to lose packets under system stressed situation, but i still
observed duplicate messages, which concerned me.

What tool and what version do you use 
as message generator and receiver?


Thanks,
 - jay

Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void usage(){
  puts("usage: fclisten ");
  puts("  Default -> listening fork-connector");
  puts("  on  -> fork-connector Enable");
  puts("  off -> fork-connector Disable");
  exit(0);
}

#define MODE_LISTEN  (1)
#define MODE_ENABLE  (2)
#define MODE_DISABLE (3)

struct cb_id
{
  __u32   idx;
  __u32   val;
};

struct cn_msg
{
  struct cb_idid;
  __u32   seq;
  __u32   ack;
  __u32   len;/* Length of the following data */
  __u8data[0];
};


int main(int argc, char *argv[]){
  char buf[4096];
  int mode, sockfd, len;
  struct sockaddr_nl ad;
  struct nlmsghdr *hdr = (struct nlmsghdr *)buf;
  struct cn_msg *msg = (struct cn_msg *)(buf+sizeof(struct nlmsghdr));
  
  switch(argc){
  case 1:
mode = MODE_LISTEN;
break;
  case 2:
if (strcasecmp("on",argv[1])==0) {
  mode = MODE_ENABLE;
}else if (strcasecmp("off",argv[1])==0){
  mode = MODE_DISABLE;
}else{
  usage();
}
break;
  default:
usage();
break;
  }
  
  if( (sockfd=socket(PF_NETLINK, SOCK_RAW, NETLINK_NFLOG)) < 0 ){
fprintf(stderr, "Fault on socket().\n");
return( 1 );
  }
  ad.nl_family = AF_NETLINK;
  ad.nl_pad = 0;
  ad.nl_pid = getpid();
  ad.nl_groups = CN_IDX_FORK;
  if( bind(sockfd, (struct sockaddr *)&ad, sizeof(ad)) ){
fprintf(stderr, "Fault on bind to netlink.\n");
return( 2 );
  }

  if (mode==MODE_LISTEN) {
while(-1){
  len = recvfrom(sockfd, buf, 4096, 0, NULL, NULL);
  printf("%d-byte recv Seq=%d\n", len, hdr->nlmsg_seq);
}
  }else{
ad.nl_family = AF_NETLINK;
ad.nl_pad = 0;
ad.nl_pid = 0;
ad.nl_groups = 1;

hdr->nlmsg_len = sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + 
sizeof(int);
hdr->nlmsg_type = 0;
hdr->nlmsg_flags = 0;
hdr->nlmsg_seq = 0;
hdr->nlmsg_pid = getpid();
msg->id.idx = 0xfeed;
msg->id.val = 0xbeef;
msg->seq = msg->ack = 0;
msg->len = sizeof(int);

if (mode==MODE_ENABLE){
  (*(int *)(msg->data)) = 1;
} else {
  (*(int *)(msg->data)) = 0;
}
sendto(sockfd, buf, sizeof(struct nlmsghdr)+sizeof(struct 
cn_msg)+sizeof(int),
   0, (struct sockaddr *)&ad, sizeof(ad));
  }
}
#include 
#include 

int main(int argc, char *argv[])
{
int pid;
int i = 0, max = 10;
struct timeval tv0, tv1;
struct timezone tz;
long diff;

if (argc >= 2)
max = atoi(argv[1]);

signal(SIGCHLD, SIG_IGN);

gettimeofday(&tv0, &tz);
while (i++ < max) {
pid = fork();
if (pid == 0) {
sleep(1);
exit (0);
}
}
gettimeofday(&tv1, &tz);

diff = (tv1.tv_sec - tv0.tv_sec)*100 + (tv1.tv_usec - tv0.tv_usec);

printf("Average per process fork+exit time is %ld usecs [diff=%lu, 
max=%d].\n", diff/max, diff, max);
return 0;
}



Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector

2005-04-08 Thread Jay Lan
With the patch you provide to me, i did not see the bugcheck
at cn_queue_wrapper() at the console.
Unmatched sequence number messages still happened. We expect
to lose packets under system stressed situation, but i still
observed duplicate messages, which concerned me.
Unmatched seq. Rcvd=79477, expected=79478   <=== duplicate
Unmatched seq. Rcvd=713823, expected=713422 * loss of 401 msgs
Unmatched seq. Rcvd=80024, expected=79901   * loss of 123 msgs
Unmatched seq. Rcvd=93632, expected=93633   <=== duplicate
Unmatched seq. Rcvd=94718, expected=93970
Unmatched seq. Rcvd=743576, expected=743502
Unmatched seq. Rcvd=123506, expected=123507 <=== duplicate
Unmatched seq. Rcvd=773753, expected=773503
Unmatched seq. Rcvd=124111, expected=123938
Unmatched seq. Rcvd=157172, expected=157173 <=== duplicate
Unmatched seq. Rcvd=813024, expected=812913 <=== duplicate
Unmatched seq. Rcvd=813024, expected=813025 <=== duplicate
Unmatched seq. Rcvd=157830, expected=157501
Unmatched seq. Rcvd=158408, expected=158145
Unmatched seq. Rcvd=813678, expected=813438
The test system was a two cpu ia64.
Thanks,
 - jay
Evgeniy Polyakov wrote:
On Fri, 08 Apr 2005 15:08:13 -0700
Jay Lan <[EMAIL PROTECTED]> wrote:

Hi Evgeniy,
Forget about my previous request of a new patch.
The failures were straight forward enough to figure out.

Ok.
The latest sources are always awailable at
http://tservice.net.ru/~s0mbre/archive/connector

- jay
Jay Lan wrote:
My workarea was based on 2.6.12-rc1-mm4 plus Guilluame's patch.
Your patch caused 5 out of 8 hunks failure at connector.c
and one failure at connector.h.
Could you generate a new patch based on my version? A tar
file of complete source of drivers/connector would work
also. :)
Thanks!
- jay
Evgeniy Polyakov wrote:

Could you give attached patch a try instead of previous one.
It adds gfp mask into cn_netlink_send() call also.
If you need updated CBUS sources, feel free to ask, I will send 
updated sources with Andrew's comments resolved too.

I do not know exactly your connector version, so patch will probably 
be applied with fuzz.

feel free to contact if it does not apply, I will send
the whole sources.
Thank you.
* looking for [EMAIL PROTECTED]/connector--main--0--patch-38 to 
compare with
* comparing to [EMAIL PROTECTED]/connector--main--0--patch-38
M  connector.c
M  connector.h
M  cbus.c

* modified files
--- orig/drivers/connector/connector.c
+++ mod/drivers/connector/connector.c
@@ -70,7 +70,7 @@
 * then it is new message.
 *
 */
-void cn_netlink_send(struct cn_msg *msg, u32 __groups)
+void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask)
{
struct cn_callback_entry *n, *__cbq;
unsigned int size;
@@ -102,7 +102,7 @@
size = NLMSG_SPACE(sizeof(*msg) + msg->len);
-skb = alloc_skb(size, GFP_ATOMIC);
+skb = alloc_skb(size, gfp_mask);
if (!skb) {
printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", 
size);
return;
@@ -119,11 +119,11 @@
#endif
   
NETLINK_CB(skb).dst_groups = groups;
-
-uskb = skb_clone(skb, GFP_ATOMIC);
-if (uskb)
+#if 0
+uskb = skb_clone(skb, gfp_mask);
+if (uskb && 0)
netlink_unicast(dev->nls, uskb, 0, 0);
-   
+#endif   
netlink_broadcast(dev->nls, skb, 0, groups, GFP_ATOMIC);

return;
@@ -158,7 +158,7 @@
}
spin_unlock_bh(&dev->cbdev->queue_lock);
-return found;
+return (found)?0:-ENODEV;
}
/*
@@ -181,7 +181,6 @@
"requested msg->len=%u[%u], nlh->nlmsg_len=%u, 
skb->len=%u.\n",
msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)),
nlh->nlmsg_len, skb->len);
-kfree_skb(skb);
return -EINVAL;
}
#if 0
@@ -215,17 +214,18 @@
   skb->len, skb->data_len, skb->truesize, skb->protocol,
   skb_cloned(skb), skb_shared(skb));
#endif
-while (skb->len >= NLMSG_SPACE(0)) {
+if (skb->len >= NLMSG_SPACE(0)) {
nlh = (struct nlmsghdr *)skb->data;
+
if (nlh->nlmsg_len < sizeof(struct cn_msg) ||
skb->len < nlh->nlmsg_len ||
nlh->nlmsg_len > CONNECTOR_MAX_MSG_SIZE) {
-#if 0
+#if 1
printk(KERN_INFO "nlmsg_len=%u, sizeof(*nlh)=%u\n",
   nlh->nlmsg_len, sizeof(*nlh));
#endif
kfree_skb(skb);
-break;
+goto out;
}

len = NLMSG_ALIGN(nlh->nlmsg_len);
@@ -233,22 +233,11 @@
len = skb->len;
err = __cn_rx_skb(skb, nlh);
-if (err) {
-#if 0
-if (err < 0 && (nlh->nlmsg_flags & NLM_F_ACK))
-netlink_ack(skb, nlh, -err);
-#endif
-break;
-} else {
-#if 0
-if (nlh->nlmsg_flags & NLM_F_ACK)
-netlink_ack(skb, nlh, 0);
-#endif
-break;
-}
-skb_pull(sk

Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector

2005-04-08 Thread Jay Lan
Hi Evgeniy,
Forget about my previous request of a new patch.
The failures were straight forward enough to figure out.
- jay
Jay Lan wrote:
My workarea was based on 2.6.12-rc1-mm4 plus Guilluame's patch.
Your patch caused 5 out of 8 hunks failure at connector.c
and one failure at connector.h.
Could you generate a new patch based on my version? A tar
file of complete source of drivers/connector would work
also. :)
Thanks!
 - jay
Evgeniy Polyakov wrote:
Could you give attached patch a try instead of previous one.
It adds gfp mask into cn_netlink_send() call also.
If you need updated CBUS sources, feel free to ask, I will send 
updated sources with Andrew's comments resolved too.

I do not know exactly your connector version, so patch will probably 
be applied with fuzz.

feel free to contact if it does not apply, I will send
the whole sources.
Thank you.
* looking for [EMAIL PROTECTED]/connector--main--0--patch-38 to 
compare with
* comparing to [EMAIL PROTECTED]/connector--main--0--patch-38
M  connector.c
M  connector.h
M  cbus.c

* modified files
--- orig/drivers/connector/connector.c
+++ mod/drivers/connector/connector.c
@@ -70,7 +70,7 @@
  * then it is new message.
  *
  */
-void cn_netlink_send(struct cn_msg *msg, u32 __groups)
+void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask)
 {
 struct cn_callback_entry *n, *__cbq;
 unsigned int size;
@@ -102,7 +102,7 @@
 
 size = NLMSG_SPACE(sizeof(*msg) + msg->len);
 
-skb = alloc_skb(size, GFP_ATOMIC);
+skb = alloc_skb(size, gfp_mask);
 if (!skb) {
 printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", 
size);
 return;
@@ -119,11 +119,11 @@
 #endif

 NETLINK_CB(skb).dst_groups = groups;
-
-uskb = skb_clone(skb, GFP_ATOMIC);
-if (uskb)
+#if 0
+uskb = skb_clone(skb, gfp_mask);
+if (uskb && 0)
 netlink_unicast(dev->nls, uskb, 0, 0);
-   
+#endif   
 netlink_broadcast(dev->nls, skb, 0, groups, GFP_ATOMIC);
 
 return;
@@ -158,7 +158,7 @@
 }
 spin_unlock_bh(&dev->cbdev->queue_lock);
 
-return found;
+return (found)?0:-ENODEV;
 }
 
 /*
@@ -181,7 +181,6 @@
 "requested msg->len=%u[%u], nlh->nlmsg_len=%u, 
skb->len=%u.\n",
 msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)),
 nlh->nlmsg_len, skb->len);
-kfree_skb(skb);
 return -EINVAL;
 }
 #if 0
@@ -215,17 +214,18 @@
skb->len, skb->data_len, skb->truesize, skb->protocol,
skb_cloned(skb), skb_shared(skb));
 #endif
-while (skb->len >= NLMSG_SPACE(0)) {
+if (skb->len >= NLMSG_SPACE(0)) {
 nlh = (struct nlmsghdr *)skb->data;
+
 if (nlh->nlmsg_len < sizeof(struct cn_msg) ||
 skb->len < nlh->nlmsg_len ||
 nlh->nlmsg_len > CONNECTOR_MAX_MSG_SIZE) {
-#if 0
+#if 1
 printk(KERN_INFO "nlmsg_len=%u, sizeof(*nlh)=%u\n",
nlh->nlmsg_len, sizeof(*nlh));
 #endif
 kfree_skb(skb);
-break;
+goto out;
 }
 
 len = NLMSG_ALIGN(nlh->nlmsg_len);
@@ -233,22 +233,11 @@
 len = skb->len;
 
 err = __cn_rx_skb(skb, nlh);
-if (err) {
-#if 0
-if (err < 0 && (nlh->nlmsg_flags & NLM_F_ACK))
-netlink_ack(skb, nlh, -err);
-#endif
-break;
-} else {
-#if 0
-if (nlh->nlmsg_flags & NLM_F_ACK)
-netlink_ack(skb, nlh, 0);
-#endif
-break;
-}
-skb_pull(skb, len);
+if (err < 0)
+kfree_skb(skb);
 }
-   
+
+out:
 kfree_skb(__skb);
 }
 
@@ -310,7 +299,7 @@
 m.ack = notify_event;
 
 memcpy(&m.id, id, sizeof(m.id));
-cn_netlink_send(&m, ctl->group);
+cn_netlink_send(&m, ctl->group, GFP_ATOMIC);
 }
 }
 spin_unlock_bh(¬ify_lock);

--- orig/include/linux/connector.h
+++ mod/include/linux/connector.h
@@ -148,7 +148,7 @@
 
 int cn_add_callback(struct cb_id *, char *, void (* callback)(void *));
 void cn_del_callback(struct cb_id *);
-void cn_netlink_send(struct cn_msg *, u32);
+void cn_netlink_send(struct cn_msg *, u32, int);
 
 int cn_queue_add_callback(struct cn_queue_dev *dev, struct 
cn_callback *cb);
 void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id);




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector

2005-04-08 Thread Jay Lan
My workarea was based on 2.6.12-rc1-mm4 plus Guilluame's patch.
Your patch caused 5 out of 8 hunks failure at connector.c
and one failure at connector.h.
Could you generate a new patch based on my version? A tar
file of complete source of drivers/connector would work
also. :)
Thanks!
 - jay
Evgeniy Polyakov wrote:
Could you give attached patch a try instead of previous one.
It adds gfp mask into cn_netlink_send() call also.
If you need updated CBUS sources, feel free to ask, 
I will send updated sources with Andrew's comments resolved too.

I do not know exactly your connector version, 
so patch will probably be applied with fuzz.

feel free to contact if it does not apply, I will send
the whole sources.
Thank you.
* looking for [EMAIL PROTECTED]/connector--main--0--patch-38 to compare with
* comparing to [EMAIL PROTECTED]/connector--main--0--patch-38
M  connector.c
M  connector.h
M  cbus.c
* modified files
--- orig/drivers/connector/connector.c
+++ mod/drivers/connector/connector.c
@@ -70,7 +70,7 @@
  * then it is new message.
  *
  */
-void cn_netlink_send(struct cn_msg *msg, u32 __groups)
+void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask)
 {
 	struct cn_callback_entry *n, *__cbq;
 	unsigned int size;
@@ -102,7 +102,7 @@
 
 	size = NLMSG_SPACE(sizeof(*msg) + msg->len);
 
-	skb = alloc_skb(size, GFP_ATOMIC);
+	skb = alloc_skb(size, gfp_mask);
 	if (!skb) {
 		printk(KERN_ERR "Failed to allocate new skb with size=%u.\n", size);
 		return;
@@ -119,11 +119,11 @@
 #endif
 	
 	NETLINK_CB(skb).dst_groups = groups;
-
-	uskb = skb_clone(skb, GFP_ATOMIC);
-	if (uskb)
+#if 0
+	uskb = skb_clone(skb, gfp_mask);
+	if (uskb && 0)
 		netlink_unicast(dev->nls, uskb, 0, 0);
-	
+#endif	
 	netlink_broadcast(dev->nls, skb, 0, groups, GFP_ATOMIC);
 
 	return;
@@ -158,7 +158,7 @@
 	}
 	spin_unlock_bh(&dev->cbdev->queue_lock);
 
-	return found;
+	return (found)?0:-ENODEV;
 }
 
 /*
@@ -181,7 +181,6 @@
 "requested msg->len=%u[%u], nlh->nlmsg_len=%u, skb->len=%u.\n",
 msg->len, NLMSG_SPACE(msg->len + sizeof(*msg)),
 nlh->nlmsg_len, skb->len);
-		kfree_skb(skb);
 		return -EINVAL;
 	}
 #if 0
@@ -215,17 +214,18 @@
 	   skb->len, skb->data_len, skb->truesize, skb->protocol,
 	   skb_cloned(skb), skb_shared(skb));
 #endif
-	while (skb->len >= NLMSG_SPACE(0)) {
+	if (skb->len >= NLMSG_SPACE(0)) {
 		nlh = (struct nlmsghdr *)skb->data;
+
 		if (nlh->nlmsg_len < sizeof(struct cn_msg) ||
 		skb->len < nlh->nlmsg_len ||
 		nlh->nlmsg_len > CONNECTOR_MAX_MSG_SIZE) {
-#if 0
+#if 1
 			printk(KERN_INFO "nlmsg_len=%u, sizeof(*nlh)=%u\n",
 			   nlh->nlmsg_len, sizeof(*nlh));
 #endif
 			kfree_skb(skb);
-			break;
+			goto out;
 		}
 
 		len = NLMSG_ALIGN(nlh->nlmsg_len);
@@ -233,22 +233,11 @@
 			len = skb->len;
 
 		err = __cn_rx_skb(skb, nlh);
-		if (err) {
-#if 0
-			if (err < 0 && (nlh->nlmsg_flags & NLM_F_ACK))
-netlink_ack(skb, nlh, -err);
-#endif
-			break;
-		} else {
-#if 0
-			if (nlh->nlmsg_flags & NLM_F_ACK)
-netlink_ack(skb, nlh, 0);
-#endif
-			break;
-		}
-		skb_pull(skb, len);
+		if (err < 0)
+			kfree_skb(skb);
 	}
-			
+
+out:
 	kfree_skb(__skb);
 }
 
@@ -310,7 +299,7 @@
 			m.ack = notify_event;
 
 			memcpy(&m.id, id, sizeof(m.id));
-			cn_netlink_send(&m, ctl->group);
+			cn_netlink_send(&m, ctl->group, GFP_ATOMIC);
 		}
 	}
 	spin_unlock_bh(¬ify_lock);

--- orig/include/linux/connector.h
+++ mod/include/linux/connector.h
@@ -148,7 +148,7 @@
 
 int cn_add_callback(struct cb_id *, char *, void (* callback)(void *));
 void cn_del_callback(struct cb_id *);
-void cn_netlink_send(struct cn_msg *, u32);
+void cn_netlink_send(struct cn_msg *, u32, int);
 
 int cn_queue_add_callback(struct cn_queue_dev *dev, struct cn_callback *cb);
 void cn_queue_del_callback(struct cn_queue_dev *dev, struct cb_id *id);



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector

2005-04-07 Thread Jay Lan
BTW, when it happened last time, my program listening to the socket
complained about duplicate messages received.
   Unmatched seq. Rcvd=1824062, expected=1824061   <===
   Unmatched seq. Rcvd=1824062, expected=1824063   <===
   Unmatched seq. Rcvd=1824348, expected=1824307
When my program received 1824062 while expecting 1824061
it adjusted itself to expect the next one being 1824063. But instead
the message of 1824062 arrived the second time.
   - jay
Jay Lan wrote:
Hi Evgeniy,
Should i be concerned about this bugcheck?
I have seen this happening a number of times, all with the same signature
in my testing. I ran a mix of AIM7,  ubench,  fork-test (continuously 
fork new
processes), and another program reading from the fork connector socket.

Thanks,
- jay

cqueue/1[656]: bugcheck! 0 [1]
Modules linked in: nfs lockd sunrpc sg st sr_mod ipv6 usbcore
Pid: 656, CPU 1, comm: cqueue/1
psr : 1010085a6010 ifs : 8289 ip  : 
[]Not tainted
ip is at __kfree_skb+0x1b0/0x220
unat:  pfs : 0289 rsc : 0003
rnat:  bsps:  pr  : 9641
ldrs:  ccv :  fpsr: 0009804c8a70433f
csd :  ssd : 
b0  : a001005cee50 b6  : a001e7e0 b7  : a001003ae440
f6  : 0fffbc8c0 f7  : 0ffdaa200
f8  : 18000 f9  : 10002a000
f10 : 0fffcc8c0 f11 : 1003e
r1  : a00100c0ec00 r2  : 4000 r3  : 4000
r8  : 0028 r9  : a001008eaac8 r10 : 0004
r11 : 0028 r12 : e0307a99fd60 r13 : e0307a998000
r14 : a00100887c00 r15 : a00100a24b18 r16 : a00100a22e18
r17 :  r18 : a00100887bec r19 : a00100a9080f
r20 : 3517 r21 : 000f r22 : 0034
r23 : 0034 r24 : a00100a90810 r25 : 3518
r26 : 3518 r27 : 0010085a6010 r28 : a00100a90811
r29 : 3519 r30 :  r31 : a00100a24ae8

Call Trace:
[] show_stack+0x80/0xa0
   sp=e0307a99f920 bsp=e0307a999078
[] show_regs+0x840/0x880
   sp=e0307a99faf0 bsp=e0307a999018
[] die+0x150/0x1c0
   sp=e0307a99fb00 bsp=e0307a998fd0
[] die_if_kernel+0x40/0x60
   sp=e0307a99fb00 bsp=e0307a998fa0
[] ia64_bad_break+0x300/0x380
   sp=e0307a99fb00 bsp=e0307a998f78
[] ia64_leave_kernel+0x0/0x280
   sp=e0307a99fb90 bsp=e0307a998f78
[] __kfree_skb+0x1b0/0x220
   sp=e0307a99fd60 bsp=e0307a998f30
[] kfree_skb+0x50/0xa0
   sp=e0307a99fd60 bsp=e0307a998f10
[] cn_queue_wrapper+0xe0/0x100
   sp=e0307a99fd60 bsp=e0307a998ee8
[] worker_thread+0x3e0/0x520
   sp=e0307a99fd60 bsp=e0307a998e60
[] kthread+0x290/0x300
   sp=e0307a99fdd0 bsp=e0307a998e20
[] kernel_thread_helper+0xe0/0x100
   sp=e0307a99fe30 bsp=e0307a998df0
[] start_kernel_thread+0x20/0x40
   sp=e0307a99fe30 bsp=e0307a998df0

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector

2005-04-07 Thread Jay Lan
Hi Evgeniy,
Should i be concerned about this bugcheck?
I have seen this happening a number of times, all with the same signature
in my testing. I ran a mix of AIM7,  ubench,  fork-test (continuously 
fork new
processes), and another program reading from the fork connector socket.

Thanks,
- jay

cqueue/1[656]: bugcheck! 0 [1]
Modules linked in: nfs lockd sunrpc sg st sr_mod ipv6 usbcore
Pid: 656, CPU 1, comm: cqueue/1
psr : 1010085a6010 ifs : 8289 ip  : 
[]Not tainted
ip is at __kfree_skb+0x1b0/0x220
unat:  pfs : 0289 rsc : 0003
rnat:  bsps:  pr  : 9641
ldrs:  ccv :  fpsr: 0009804c8a70433f
csd :  ssd : 
b0  : a001005cee50 b6  : a001e7e0 b7  : a001003ae440
f6  : 0fffbc8c0 f7  : 0ffdaa200
f8  : 18000 f9  : 10002a000
f10 : 0fffcc8c0 f11 : 1003e
r1  : a00100c0ec00 r2  : 4000 r3  : 4000
r8  : 0028 r9  : a001008eaac8 r10 : 0004
r11 : 0028 r12 : e0307a99fd60 r13 : e0307a998000
r14 : a00100887c00 r15 : a00100a24b18 r16 : a00100a22e18
r17 :  r18 : a00100887bec r19 : a00100a9080f
r20 : 3517 r21 : 000f r22 : 0034
r23 : 0034 r24 : a00100a90810 r25 : 3518
r26 : 3518 r27 : 0010085a6010 r28 : a00100a90811
r29 : 3519 r30 :  r31 : a00100a24ae8

Call Trace:
[] show_stack+0x80/0xa0
   sp=e0307a99f920 bsp=e0307a999078
[] show_regs+0x840/0x880
   sp=e0307a99faf0 bsp=e0307a999018
[] die+0x150/0x1c0
   sp=e0307a99fb00 bsp=e0307a998fd0
[] die_if_kernel+0x40/0x60
   sp=e0307a99fb00 bsp=e0307a998fa0
[] ia64_bad_break+0x300/0x380
   sp=e0307a99fb00 bsp=e0307a998f78
[] ia64_leave_kernel+0x0/0x280
   sp=e0307a99fb90 bsp=e0307a998f78
[] __kfree_skb+0x1b0/0x220
   sp=e0307a99fd60 bsp=e0307a998f30
[] kfree_skb+0x50/0xa0
   sp=e0307a99fd60 bsp=e0307a998f10
[] cn_queue_wrapper+0xe0/0x100
   sp=e0307a99fd60 bsp=e0307a998ee8
[] worker_thread+0x3e0/0x520
   sp=e0307a99fd60 bsp=e0307a998e60
[] kthread+0x290/0x300
   sp=e0307a99fdd0 bsp=e0307a998e20
[] kernel_thread_helper+0xe0/0x100
   sp=e0307a99fe30 bsp=e0307a998df0
[] start_kernel_thread+0x20/0x40
   sp=e0307a99fe30 bsp=e0307a998df0

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.12-rc1-mm4] fork_connector: add a fork connector

2005-03-31 Thread Jay Lan
Andrew Morton wrote:
Guillaume Thouvenin <[EMAIL PROTECTED]> wrote:
 

 This patch adds a fork connector in the do_fork() routine.
...
 The fork connector is used by the Enhanced Linux System Accounting
project http://elsa.sourceforge.net
   

Does it also meet all the in-kernel requirements for other accounting
projects?
If not, what else do those other projects need?
 

Hi Andrew,
As the discussion in this thread of the past few days showed, this
patch intends to take care of process grouping, but not the
accounting data collection. Besides my concern of possibility of
data loss, this patch also provides CSA information to handle process
grouping as it intends to do. I plan to run some testing to see percentage
of data loss when system is under stress test, but improvement at
CBUS as Evgeniy indicated should help!
Please be advised that i still need an do_exit handling to save accounting
data. But, it is a separate issue.
Thanks,
- jay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-30 Thread Jay Lan
The parent information ((ppid,pid) pair) is useful for process group
aggregation while do_exit() hook is needed to save per task
accounting data before the task data is disposed.
Thanks,
 - jay
dean gaudet wrote:
On Tue, 29 Mar 2005, Jay Lan wrote:

The fork_connector is not designed to solve accounting data collection
problem.
The accounting data collection must be done via a hook from do_exit().

by the time do_exit() occurs the parent may have disappeared... you do 
need to record something at fork() time so that you can account to the 
correct ancestor.

an example of where this ancestry is useful would be the summation of all 
cpu time spent by children of apache, spamd, clamd, ...

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Jay Lan
Paul,
The fork_connector is not designed to solve accounting data collection
problem.
The accounting data collection must be done via a hook from do_exit().
The acct_process() hook invokes do_acct_process() to write BSD
accounting data to disk. CSA needs a similar hook off do_exit() to
collect more accounting data and write to disk in different
accounting records format. This part is not part of fork_connector.
ELSA does not care how accounting data being written to disks. However,
it needs the accounting data being reliably and accurately collected
and written to disk by BSD and/or CSA and it needs the 
information to aggregate processes. It was never the fork_connector's
intention to piggy back the data to the accounting file.
Thanks,
 - jay

Paul Jackson wrote:
Evgeniy writes:
Here forking connector module "exits" and can handle next fork() on the
same CPU.

Fine ... but it's not about what the fork_connector does.  It's about
getting the accounting data to disk, if I understand correctly.

That is why it is very fast in "fast-path".

I don't care how fast a tool is.  I care how fast the job gets done.  If
a tool is only doing part of the job, then we can't decide whether to
use that tool just based on how fast that part of the job gets done.

The most expensive part is cn_netlink_send()/netlink_broadcast(), 
with CBUS it is deferred to the safe time,

This is "safe time" for the immediate purpose of letting the forking
process continue on its way.  But the deferred work of buffering up the
data and writing it to disk still needs to be done, pretty soon.  When
sizing a system to see how many users or jobs I can run on it at a time,
I will have to include sufficient cpu, memory and disk i/o to handle
getting this accounting data to disk, right?

2) Using a modified form of what BSD ACCOUNTING does now:
- forking process appends single fork data to in-kernel buffer
It is not as simple.
It takes global locks several times, it access bunch of shared between
CPU data.
It calls ->stat() and ->write() which may sleep.

Hmmm ... good points.  The mechanisms in the kernel now (and for the
last 25 years) to write out BSD ACCOUNTING data may not be numa friendly.
Perhaps there should be a per-cpu 512 byte buffer, which can gather up 8
accounting records (64 bytes each) and only call the file system write
once every 8 task exits.  Or perhaps a per-node buffer, with a spinlock
to serialize access by the CPUs on that node.  Or perhaps per-node
accounting files.  Or something like that.
Guillaume, Jay - do we (you ?) need to make classic BSD ACCOUNTING data
collection numa friendly?  Based on the various frustrated comments at
the top of kernel/acct.c, this could be a non-trivial effort to get
right.  Maybe we need it, but can't afford it.
And perhaps my proposed variable length records for supplementary
accounting, such as  from fork, need to allow
for some way to pad out the rest of a buffer, when the next record
won't fit entirely.

That work is deferred and does not affect in-kernel processes.

The accounting data collection cannot be deferred for long, perhaps
just a few minutes.  Not until the data hits the disk can we rest
indefinitely.  Unless, that is, I don't understand what problem is
being solved here (quite possible ;).

And why userspace fork connector should write data to the disk?

I NEVER said it should.  I am NOT trying to redesign fork_connector.
Good grief ... how many times and ways do I have to say this ;)?
I am asking what is the best tool for accounting data collection,
which, if I understand correctly, does need to write to disk.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-29 Thread Jay Lan
Paul Jackson wrote:
Guillaume wrote:
 The goal of the fork connector is to inform a user space application
that a fork occurs in the kernel. This information (cpu ID, parent PID
and child PID) can be used by several user space applications. It's not
only for accounting. Accounting and fork_connector are two different
things and thus, fork_connector doesn't do the merge of any kinds of
data (and it will never do). 

Yes - it is clear that the fork_connector does this - inform user space
of fork information .  I'm not saying that
fork_connector should merge data; I'm observing that it doesn't, and
that this would seem to serve the needs of accounting poorly.
Paul,
You probably can look at it this way: the accounting data being
written out by BSD are per process data and the fork connector
provides information needed to group processes into process
aggregates.
Thanks,
 - jay
Out of curiosity, what are these 'several user space applications?'  The
only one I know of is this extension to bsd accounting to include
capturing parent and child pid at fork.  Probably you've mentioned some
other uses of fork_connector before here, but I missed it.

The relayfs is done, like Evgeniy said, for large amount of
datas. So I think that it's not suitable for what we want to achieve
with the fork connector.

I never claimed that relayfs was appropriate for fork_connector.
I'm not trying to tape a rock to Evgeniy's screwdriver.  I'm saying that
accounting looks like a nail to me, so let us see what rocks and hammers
we have in our tool box.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-22 Thread Jay Lan
Evgeniy Polyakov wrote:
On Tue, 22 Mar 2005 10:26:19 -0800
Ram <[EMAIL PROTECTED]> wrote:

On Mon, 2005-03-21 at 23:07, Guillaume Thouvenin wrote:
On Mon, 2005-03-21 at 12:52 -0800, Ram wrote:
If a bunch of applications are listening for fork events, 
your patch allows any application to turn off the 
fork event notification?  Is this the right behavior?
Yes it is. The main management is done by application so, if several
applications are listening for fork events you need to choose which one
will turn off the fork connector. 

I want to keep this turn on/off mechanism simple but if it's needed I
can manage the variable "cn_fork_enable" as a counter. Thus the callback
could be something like:
static void cn_fork_callback(void *data)
{
 int start; 
 struct cn_msg *msg = (struct cn_msg *)data;

 if (cn_already_initialized && (msg->len == sizeof(cn_fork_enable))) {
   memcpy(&start, msg->data, sizeof(cn_fork_enable));
   if (start)
 cn_fork_enable++;
   else
 cn_fork_enable > 0 ? cn_fork_enable-- : 0;
 }
}
I think a better way is:
  Providing a different connector channel called the administrator 
  channel which can be used only by a super-user, and gives you
  the ability to switch on or off any connector channel including the
  fork-connector channel.

Only super-user can bind netlink socket to multicast group.

  For lack of better term I am using the word 'channel' to mean
  something that carries events of particular type through the
  connector-infrastructure.

I still do not see why it is needed.
Super-user can run ip command and turn network interface off
not waiting while apache or named exits or unbind.
You can turn off a network interface and turn it back on without
closing a network application and it will continue to work.
In theory I can create some kind of userspace registration mechanism,
when userspace application reports it's pid to the connector, 
and then it sends data to the specified pids, but does not 
allow controlling from userspace.
But I really do not think it is a good idea to permit
non-priviledged userspace processes to know about deep
kernel internals through connector's messages.
I see this issue less a case of bad guys vs. good guys. I see it
as various components doing system related work, but there is
no mechanism of knowing who is on who is off by today's patch. A
service listening to the fork connector can request to turn off
cn_fork_enable on exit and inadquately affect other services/daemons
listening to the same connector. It is not acceptable in my opinion.
The idea of implementing fork connector enabling/disabling was so
that the kernel does not waste time writing to the sockets if no
application listening. If implementing such a feature costs
more than it saves, maybe the fork connector should simply always
send?
Thanks,
- jay


RP

What do you think about this implementation? 

Guillaume

Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-22 Thread Jay Lan
Ram wrote:
On Tue, 2005-03-22 at 11:22, Evgeniy Polyakov wrote:
On Tue, 22 Mar 2005 10:26:19 -0800
Ram <[EMAIL PROTECTED]> wrote:

On Mon, 2005-03-21 at 23:07, Guillaume Thouvenin wrote:
On Mon, 2005-03-21 at 12:52 -0800, Ram wrote:
If a bunch of applications are listening for fork events, 
your patch allows any application to turn off the 
fork event notification?  Is this the right behavior?
Yes it is. The main management is done by application so, if several
applications are listening for fork events you need to choose which one
will turn off the fork connector. 

I want to keep this turn on/off mechanism simple but if it's needed I
can manage the variable "cn_fork_enable" as a counter. Thus the callback
could be something like:
static void cn_fork_callback(void *data)
{
 int start; 
 struct cn_msg *msg = (struct cn_msg *)data;

 if (cn_already_initialized && (msg->len == sizeof(cn_fork_enable))) {
   memcpy(&start, msg->data, sizeof(cn_fork_enable));
   if (start)
 cn_fork_enable++;
   else
 cn_fork_enable > 0 ? cn_fork_enable-- : 0;
 }
}
I think a better way is:
  Providing a different connector channel called the administrator 
  channel which can be used only by a super-user, and gives you
  the ability to switch on or off any connector channel including the
  fork-connector channel.
Only super-user can bind netlink socket to multicast group.

ok. I did not realize that.

  For lack of better term I am using the word 'channel' to mean
  something that carries events of particular type through the
  connector-infrastructure.
I still do not see why it is needed.
Super-user can run ip command and turn network interface off
not waiting while apache or named exits or unbind.
In theory I can create some kind of userspace registration mechanism,
when userspace application reports it's pid to the connector, 
and then it sends data to the specified pids, but does not 
allow controlling from userspace.
But I really do not think it is a good idea to permit
non-priviledged userspace processes to know about deep
kernel internals through connector's messages.

Yes. non-priviledged userspace processes should not know
any deep kernel internals through connector events.
I think what I am driving at is, an application that is critically
dependent on the fork-notification, suddenly stops receiving such
notification because some other application has switched off the 
service without its notice. 

the reason I am concerned is I am planning to feed this fork-events
to my in-kernel module. Side note: I would really like support for
in-kernel listners through connector infrastructure.
Listners in the kernel? Sound like a PAGG thing again!
Guillaume's stuff was originally an in-kernel module. CSA/Job
are kernel modules also. We all use hook management feature of PAGG,
which offer event notification to other kernel components.
Guillaume moved to connector approach and i am moving to that
direction too because Andrew likes to see in-kernel modules moved
to user space.
It is kind of funny to see someone trying to get a notification
through a connector and then feed it back to the kernel! This
does not sound right to me!
- jay
RP

RP

What do you think about this implementation? 

Guillaume

Evgeniy Polyakov
Only failure makes us experts. -- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-22 Thread Jay Lan
Guillaume Thouvenin wrote:
On Mon, 2005-03-21 at 12:52 -0800, Ram wrote:
If a bunch of applications are listening for fork events, 
your patch allows any application to turn off the 
fork event notification?  Is this the right behavior?

Yes it is. The main management is done by application so, if several
applications are listening for fork events you need to choose which one
will turn off the fork connector. 
It is not practical. One listener never know who else are listening
to the fork connector.
We also need to protect yet another global variable "cn_fork_enable".
Also, in order to implement "cn_fork_enable" as a counter, we need
some sort of registration to prevent an application from sending
repeated "off" notification. One can only turn off its own switch.
Thanks,
 - jay
I want to keep this turn on/off mechanism simple but if it's needed I
can manage the variable "cn_fork_enable" as a counter. Thus the callback
could be something like:
static void cn_fork_callback(void *data)
{
  int start; 
  struct cn_msg *msg = (struct cn_msg *)data;

  if (cn_already_initialized && (msg->len == sizeof(cn_fork_enable))) {
memcpy(&start, msg->data, sizeof(cn_fork_enable));
if (start)
  cn_fork_enable++;
else
  cn_fork_enable > 0 ? cn_fork_enable-- : 0;
  }
}
What do you think about this implementation? 

Guillaume
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: I/O and Memory accounting...

2005-03-09 Thread Jay Lan
I thought you planned to read from CSA pacct file?
Well, while we are in discussion of whether to merge and
replace BSD accounting with CSA accounting, your proposed
change will provide you data on charater I/O in a BSD pacct
file. I supposed you do not need to have seperate fields on
character-read and character-written? CSA will provide the
data separately.
CSA writes the data to a CSA pacct file in a similar way as
BSD on exit callback at do_acct_process(). The CSA's exit
callback is implemented as a loadable module. The CSA
project and code can be downloaded at
http://oss.sgi.com/projects/csa.
Cheers,
 - jay
Guillaume Thouvenin wrote:
Hello,
  In the ChangeLog-2.6.11 file I read that the enhanced I/O accounting
data patch and the enhanced memory accounting data collection patch were
added. It's cool but I don't see how this stuff is used because
information is never dump in a file or send to an accounting application
(or I miss something). 
 
  Maybe we should update the ac_io in the "struct acct"? Thus, values
will be dump in the accounting file. Maybe it could be something like:

--- acct.c.orig 2005-03-09 14:17:07.0 +0100
+++ acct.c  2005-03-09 14:18:20.0 +0100
@@ -477,8 +477,8 @@ static void do_acct_process(long exitcod
}
vsize = vsize / 1024;
ac.ac_mem = encode_comp_t(vsize);
-   ac.ac_io = encode_comp_t(0 /* current->io_usage */); /* %% */
-   ac.ac_rw = encode_comp_t(ac.ac_io / 1024);
+   ac.ac_io = encode_comp_t(current->rchar + current->wchar);
+   ac.ac_rw = encode_comp_t(0);
ac.ac_minflt = encode_comp_t(current->signal->min_flt +
 current->group_leader->min_flt);
ac.ac_majflt = encode_comp_t(current->signal->maj_flt +
For memory and read/write syscall we may add new fields. 

Best regards,
Guillaume
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa

2005-03-07 Thread Jay Lan
The patch i propose is tiny, simple and straight forward. It
touches only one file and leaves the CSA code in a  configurable
loadable module. It broke nobody's code and it does not need to
redesign existing BSD kernel code and utilities.
If we are to merge the code, there are some detailed discussion
needed to happen on implementation deail. We are talking about
supporting two different internal formats by one piece of code.
We need to maintain BSD acct format because BSD utilities count
on it.
If we are to combine two formats into one, we then need to
modify BSD utilies to understand the new format. How about
the backwards compatibility?
Now this is really an overkill. All i asked for was only
adding a few lines to acct.c.
Thanks,
 - jay
Tim Schmielau wrote:
On Wed, 2 Mar 2005, Guillaume Thouvenin wrote:

Is it possible to merge BSD and CSA? I mean with CSA, there is a part
that does per-process accounting. For exemple in the
linux-2.6.9.acct_mm.patch the two functions update_mem_hiwater() and
csa_update_integrals() update fields in the current (and parent)
process. So maybe you can improve the BSD per-process accounting or
maybe CSA can replace the BSD per-process accounting?

Yes, that was also my preferred direction - make CSA able to also write
BSD acct format, and replace the existing BSD accounting with CSA.
However it seems this will still increase the amount of kernel code quite 
a bit.

Sorry for not going into any details, I have to leave right now and will 
be offline for two weeks.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa

2005-03-02 Thread Jay Lan
I did not look into the userspace commands supported in BSD
accounting on the dependency on the format of /var/account/pacct
file.
The accounting exit hook allows BSD/CSA to save accounting
data stored in task_struct to internally kept data structure
and then writes to their respective accounting file. The
format of the accounting output file of the two are different,
with CSA being the superset of BSD and also with CSA carrying
concept of grouping of processes.
It would be an interesting project to merge BSD and CSA, but
it is going to take some surgical work on both products, not
an easy one.
- jay
Guillaume Thouvenin wrote:
On Tue, 2005-03-01 at 10:06 -0800, Jay Lan wrote:
Sorry I was not clear on my point.
I was trying to point out that, an exit hook for BSD and CSA is
essential to save accounting data before the data is gone. That
can not be done with a netlink.
So, my patch was to keep acct_process as a wrapper, which
would then call do_exit_csa() for CSA and call do_acct_process
for BSD.

Is it possible to merge BSD and CSA? I mean with CSA, there is a part
that does per-process accounting. For exemple in the
linux-2.6.9.acct_mm.patch the two functions update_mem_hiwater() and
csa_update_integrals() update fields in the current (and parent)
process. So maybe you can improve the BSD per-process accounting or
maybe CSA can replace the BSD per-process accounting?
Guillaume  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa

2005-03-01 Thread Jay Lan
Sorry I was not clear on my point.
I was trying to point out that, an exit hook for BSD and CSA is
essential to save accounting data before the data is gone. That
can not be done with a netlink.
So, my patch was to keep acct_process as a wrapper, which
would then call do_exit_csa() for CSA and call do_acct_process
for BSD.
Thanks,
 - jay
Guillaume Thouvenin wrote:
On Mon, 2005-02-28 at 10:56 -0800, Jay Lan wrote:
The exit hook is essential for CSA to save off data before the data
is gone, A netlink type of thing does not help. BSD is in the same
situation. You can not replace the acct_process() call with a netlink.
If ELSA is to use the enhanced accounting data, it needs the CSA
eop handling at exit as well.

Why replace the acct_process()? The problem here is to add a new hook in
the do_fork() and you can use the BSD accounting hook acct_process()
which is already in the exit() routine. We don't need to replace it with
a netlink because today there are no user space applications that need
it. 

Best regards,
Guillaume 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa

2005-02-28 Thread Jay Lan
Hi Andrew,
You asked:
>
> In other words: given that ELSA can do its thing via existing accounting
> interfaces and a fork notifier, why does CSA need to add lots more kernel
> code?
And i explained:
> Here are some codes from do_exit() starting line 813 (based on
> 2.6.11-rc4-mm1):
>
> 813acct_update_integrals(tsk);
> 814update_mem_hiwater(tsk);
> 815group_dead = atomic_dec_and_test(&tsk->signal->live);
> 816if (group_dead) {
> 817del_timer_sync(&tsk->signal->real_timer);
> 818acct_process(code);
> 819}
> 820exit_mm(tsk);
>
> The acct_process() is called to save off BSD accounting data at
> line 818. The next statement at 820, tsk->mm is disposed and all
> data saved at tsk->mm is gone, including memory hiwater marks
> information saved at line 814. The complete tsk is disposed
> before exit of do_exit() routine.
I was hoping Guilluame could jump in himself...
But, in a separate discussion thread, he wrote:
>> (Jay asked)
>> I spent some time trying to understand how ELSA save the per-process
>> accounting data before the task_struct data structure gets disposed,
>> but failed to find anything. My assumption would be that ELSA does
>> not collection those data in the kernel itself? Instead, it would
>> read the pacct file created by either BSD or CSA?
>
> Yes you're right, ELSA reads accounting file created by BSD or CSA. We
> don't make accounting. I think that the communication problem is here
Guilluame's reply should answer your question. ELSA does not collect
accounting data in the kernel as BSD/CSA does. It uses accounting
data collected by BSD/CSA.
The exit hook is essential for CSA to save off data before the data
is gone, A netlink type of thing does not help. BSD is in the same
situation. You can not replace the acct_process() call with a netlink.
If ELSA is to use the enhanced accounting data, it needs the CSA
eop handling at exit as well.
Thanks,
 - jay
Guillaume Thouvenin wrote:
On Thu, 2005-02-24 at 20:46 -0800, Andrew Morton wrote:
Jay Lan <[EMAIL PROTECTED]> wrote:
Since my idea of providing an accounting framework was considered
'overkill', here i submit a tiny patch just to allow CSA to
handle end-of-process (eop) situation by saving off accounting
data before a task_struct is disposed.
This patch is to modify the acct_process() in acct.c, which is
invoked from do_exit() to handle eop for BSD accounting. Now
the acct_process() wrapper will also take care of CSA, if it has
been loaded. If the CSA module has been loaded, a CSA routine
will be invoked to construct a CSA job record and to write the
record to the CSA accounting file.
I really don't want to have to (and shouldn't need to) become an accounting
person, but as there seems to be a communication problem somewhere..
Please, you guys are the subject matter experts.  Put your heads together
and come up with something.

I completely agree with that and we will continue this conversation in
private with Jay and all people involved to come up with an appropriate
solution.
Guillaume
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-25 Thread Jay Lan
Andrew Morton wrote:
Jay Lan <[EMAIL PROTECTED]> wrote:
Andrew Morton wrote:
> Kaigai Kohei <[EMAIL PROTECTED]> wrote:
> 
>>In my understanding, what Andrew Morton said is "If target functionality can
>> implement in user space only, then we should not modify the kernel-tree".
> 
> 
> fork, exec and exit upcalls sound pretty good to me.  As long as
> 
> a) they use the same common machinery and
> 
> b) they are next-to-zero cost if something is listening on the netlink
>socket but no accounting daemon is running.
> 
> Question is: is this sufficient for CSA?

Yes, fork, exec, and exit upcalls are sufficient for CSA.
The framework i proposed earlier should satisfy your requirement a
and b, and provides upcalls needed by BSD, ELSA and CSA. Maybe i
misunderstood your concern of the 'very light weight' framework
i proposed besides being "overkill"?

"upcall" is poorly defined.
What I meant was that ELSA can perform its function when the kernel merely
sends asynchronous notifications of forks out to userspace via netlink.
Further, I'm wondering if CSA can perform its function with the same level
of kernel support, perhaps with the addition of netlink-based notification
of exec and exit as well.
The framework patch which you sent was designed to permit the addition of
more kernel accounting code, which is heading in the opposite direction.
In other words: given that ELSA can do its thing via existing accounting
interfaces and a fork notifier, why does CSA need to add lots more kernel
code?
Here are some codes from do_exit() starting line 813 (based on
2.6.11-rc4-mm1):
813acct_update_integrals(tsk);
814update_mem_hiwater(tsk);
815group_dead = atomic_dec_and_test(&tsk->signal->live);
816if (group_dead) {
817del_timer_sync(&tsk->signal->real_timer);
818acct_process(code);
819}
820exit_mm(tsk);
The acct_process() is called to save off BSD accounting data at
line 818. The next statement at 820, tsk->mm is disposed and all
data saved at tsk->mm is gone, including memory hiwater marks
information saved at line 814. The complete tsk is disposed
before exit of do_exit() routine.
In separate emails discussion thread among interested parties,
i asked Guillaume to clarify this question. I suspect ELSA counts
on BSD's acct_process() at line 818 to save most accounting data.
If that is the case and since ELSA wants extended accounting data
collection, a way to save the extended acct data would be essential
to ELSA as well.
I can better asnwer your "why ELSA can do but CSA can't" question
after i learn more from Guilluame.
Later,
 - jay
---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
___
Lse-tech mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lse-tech
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-25 Thread Jay Lan
Chris Wright wrote:
* Jay Lan ([EMAIL PROTECTED]) wrote:
Andrew Morton wrote:
Kaigai Kohei <[EMAIL PROTECTED]> wrote:

In my understanding, what Andrew Morton said is "If target functionality 
can
implement in user space only, then we should not modify the kernel-tree".

fork, exec and exit upcalls sound pretty good to me.  As long as
a) they use the same common machinery and
b) they are next-to-zero cost if something is listening on the netlink
 socket but no accounting daemon is running.
Question is: is this sufficient for CSA?
Yes, fork, exec, and exit upcalls are sufficient for CSA.

As soon as you want to throttle tasks at the Job level, this would be
insufficient.  But, IIRC, that's not one of PAGG/Job/CSA's requirements
right?
PAGG serves more than JOB+CSA.
I am looking into possiblity/feasibility of implementing JOB at
userspace. However, even with JOB as a kernel module, the fork,
exec and exit upcalls would be sufficient to support JOB+CSA.
Thanks,
 - jay
thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-25 Thread Jay Lan
Andrew Morton wrote:
Kaigai Kohei <[EMAIL PROTECTED]> wrote:
In my understanding, what Andrew Morton said is "If target functionality can
implement in user space only, then we should not modify the kernel-tree".

fork, exec and exit upcalls sound pretty good to me.  As long as
a) they use the same common machinery and
b) they are next-to-zero cost if something is listening on the netlink
   socket but no accounting daemon is running.
Question is: is this sufficient for CSA?
Yes, fork, exec, and exit upcalls are sufficient for CSA.
The framework i proposed earlier should satisfy your requirement a
and b, and provides upcalls needed by BSD, ELSA and CSA. Maybe i
misunderstood your concern of the 'very light weight' framework
i proposed besides being "overkill"?
- jay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.11-rc4-mm1] end-of-proces handling for acct-csa

2005-02-24 Thread Jay Lan
Since my idea of providing an accounting framework was considered
'overkill', here i submit a tiny patch just to allow CSA to
handle end-of-process (eop) situation by saving off accounting
data before a task_struct is disposed.
This patch is to modify the acct_process() in acct.c, which is
invoked from do_exit() to handle eop for BSD accounting. Now
the acct_process() wrapper will also take care of CSA, if it has
been loaded. If the CSA module has been loaded, a CSA routine
will be invoked to construct a CSA job record and to write the
record to the CSA accounting file.
This patch only touchs one file: kernel/acct.c.
Signed-off-by: Jay Lan <[EMAIL PROTECTED]>

Index: linux/kernel/acct.c
===
--- linux.orig/kernel/acct.c2005-02-24 15:55:05.519092861 -0800
+++ linux/kernel/acct.c 2005-02-24 16:33:56.381584083 -0800
@@ -73,6 +73,11 @@ int acct_parm[3] = {4, 2, 30};
 /*
  * External references and all of the globals.
  */
+
+/* do_exit hook used by CSA */
+void (*do_exit_csa)(int, struct task_struct *) = NULL;
+EXPORT_SYMBOL_GPL(do_exit_csa);
+
 static void do_acct_process(long, struct file *);
 
 /*
@@ -504,12 +509,17 @@ static void do_acct_process(long exitcod
 }
 
 /*
- * acct_process - now just a wrapper around do_acct_process
+ * acct_process - now just a wrapper around
+ * do_acct_process - for BSD accounting
+ * do_exit_csa - for CSA
  */
 void acct_process(long exitcode)
 {
struct file *file = NULL;
 
+   if (do_exit_csa != NULL)
+   do_exit_csa(exitcode, current);
+
/*
 * accelerate the common fastpath:
 */


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-24 Thread Jay Lan
Tim Schmielau wrote:
On Tue, 22 Feb 2005, Andrew Morton wrote:

We really want to avoid doing such stuff in-kernel if at all possible, of
course.
Is it not possible to implement the fork/exec/exit notifications to
userspace so that a daemon can track the process relationships and perform
aggregation based upon individual tasks' accounting?  That's what one of
the accounting systems is proposing doing, I believe.
(In fact, why do we even need the notifications?  /bin/ps can work this
stuff out).

I had started a proof of concept implementation that could reconstruct the 
whole process tree from userspace just from the BSD accounting currently 
in the kernel (+ the conceptual bug-fix that I misnamed "[RFC] "biological 
parent" pid"). This could do the whole job ID thing from userspace.
Unfortunately, I haven't had time to work on it recently.

Also, doing per-job accounting might actually be more lightweight than 
per-process accounting, so I'm not at all opposed to unifying CSA and BSD 
accounting into one mechanism that just writes different file formats.
Thanks, Tim!
After spending some time studying how ELSA works, it appeared to me
that CSA still needs a hook for do_exit. Since people agreed that
a complete framework was an overkill, i would be glad to submit
another patch later just to provide a CSA exit-handling inside the
acct_process().
Thanks,
 - jay
A complete framework seems like overkill to me, too.
Tim
---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
___
Lse-tech mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lse-tech
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-23 Thread Jay Lan
Hi Paul,
I think the microbenchmarking your link provides is irrelevant.
Your link provides benchmarking of doing a fork.
However, we are talking about inserting a callback routine
in a fork and/or an exit. The overhead is a function
call and time spent in the routine. The callback routine
can be configured to "do {} while (0)" if a certain CONFIG
flag is not set.
Thanks,
 - jay
Paul Jackson wrote:
So, I think such a fork/execve/exit hooks is harmless now.

I don't recall seeing any microbenchmarking of the impact on fork/exit
of such hooks.  You might find such a benchmark in lmbench, or at
http://bulk.fefe.de/scalability/.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-23 Thread Jay Lan
Kaigai Kohei wrote:
Hi, Thanks for your comments.
 >> I think there are two issues about system accounting framework.
 >>
 >> Issue: 1) How to define the appropriate unit for accounting ?
 >> Current BSD-accountiong make a collection per process accounting
 >> information.
 >> CSA make additionally a collection per process-aggregation accounting.
 >
 >
 > The 'enhanced acct data collection' patches that were added to
 > 2-6-11-rc* tree still do collection of per process data.
Hmm, I have not noticed this extension. But I made sure about it.
The following your two patches implements enhanced data collection, 
didn't it?
Yes!
- ChangeLog for 2.6.11-rc1
[PATCH] enhanced I/O accounting data patch
[PATCH] enhanced Memory accounting data collection
Since making a collection per process accounting is unified to the stock 
kernel,
I want to have a discussion about remaining half, "How to define the 
appropriate
unit for accounting ?"
We can agree that only per process-accounting is so rigid, I think.
Then, process-aggregation should be provided in one way or another.

[1] Is it necessary 'fork/exec/exit' event handling framework ?
The common agreement for the method of dealing with process aggregation
has not been constructed yet, I understood. And, we will not able to
integrate each process aggregation model because of its diverseness.
For example, a process which belong to JOB-A must not belong any other
'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can 
concurrently
belong to BANK-B1 which is a child of BANK-B.

And, there are other defferences:
Whether a process not to belong to any process-aggregation is permitted 
or not ?
Whether a process-aggregation should be inherited to child process or not ?
(There is possibility not to be inherited in a rule-based process 
aggregation like CKRM)
Guillaume answered this question, and i think a policy would work.
Some process-aggregation model have own philosophy and implemantation,
so it's hard to integrate. Thus, I think that common 'fork/exec/exit' 
event handling
framework to implement any kinds of process-aggregation.
BSD needs an exit hook and ELSA needs a fork hook. I am still
evaluating whether CSA can use the ELSA module. If CSA can use the
ELSA module, CSA maybe would be fine with the fork hook.

[2] What implementation should be adopted ?
I think registerable hooks on fork/execve/exit is necessary, not only 
exit() hook.
Because a rule or policy based process-aggregation model requirees to catch
the transition of a process status.
It might be enough to hook the exit() event only in process-accounting,
but it's not kind for another customer.

Thus, I recommend SGI's PAGG.
In my understanding, the reason for not to include such a framework is that
increase of unidentifiable (proprietary) modules is worried.
If we code the hooks explicitly in the kernel, such as in acct.c,
then the concern of unidentifiable modules should be taken care of.
A registerable framework was my preference. But if that causes
concern, it would be fine for me to do it explicit way. An
example is for acct_process() to invoke do_acct_process().
That means whoever intends to use even an existing hook needs to
present their cases, i guess.
Thanks,
 - jay

But, SI can divert LSM to implemente process-aggregation if they ignore
the LSM's original purpose, for example.
# I'm strongly opposed to such a movement as a SELinux's user :-)
So, I think such a fork/execve/exit hooks is harmless now.
Is this the time to unify it?
Thanks.
 > CSA added those per-process data to per-aggregation ("job") data
 > structure at do_exit() time when a process termintes.
 >
 >>
 >> It is appropriate to make the fork-exit event handling framework for
 >> definition
 >> of the process-aggregation, such as PAGG.
 >>
 >> This system-accounting per process-aggregation is quite useful,
 >> thought I tried the SGI's implementation named 'job' in past days.
 >>
 >>
 >> Issue: 2) What items should be collected for accounting information ?
 >> BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of
 >> minor/major page faults. SGI's CSA collects VM/RSS size on exit time,
 >> Integral-VM/RSS, and amount of block-I/O additionally.
 >
 >
 > These data are now collected in 2.6.11-rc* code. Note that these data
 > are still per-process.
 >
 >>
 >> I think it's hard to implement the accounting-engine as a kernel 
loadable
 >> module using any kinds of framework. Because, we must put callback
 >> functions
 >> into all around the kernel for this purpose.
 >>
 >> Thus, I make a proposion as follows:
 >> We should separate the process-aggregation functionality and collecting
 >> accounting informations.
 >
 >
 > I totally agree with this! Actually that was what we have done. The data
 > collection part of code has been unified.
 >
 >> Something of framework to implement process-aggregation is necessary.
 >> And, making a collection of accounting information should be merged
 >> into BSD-accounting and im

Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-23 Thread Jay Lan
Guillaume Thouvenin wrote:
On Tue, 2005-02-22 at 23:20 -0800, Andrew Morton wrote:
Kaigai Kohei <[EMAIL PROTECTED]> wrote:
The common agreement for the method of dealing with process aggregation
has not been constructed yet, I understood. And, we will not able to
integrate each process aggregation model because of its diverseness.
For example, a process which belong to JOB-A must not belong any other
'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can concurrently
belong to BANK-B1 which is a child of BANK-B.
And, there are other defferences:
Whether a process not to belong to any process-aggregation is permitted or not ?
Whether a process-aggregation should be inherited to child process or not ?
(There is possibility not to be inherited in a rule-based process aggregation 
like CKRM)
Some process-aggregation model have own philosophy and implemantation,
so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event 
handling
framework to implement any kinds of process-aggregation.

I can add "policies". With ELSA, a process belongs to one or several
groups and if a process is removed from one group, its children still
belong to the group. Thus a good idea could be to associate a
"philosophy" to a group. For exemple, when a group of processes is
created it can be tagged as UNIQUE or SHARED. UNIQUE means that a
process that belongs to it could not be added in another group by
opposition to SHARED. It's not needed inside the kernel.
This makes sense to me. CSA can use the UNIQUE policy to enforce
its "can't escape from job container" philisophy.

We really want to avoid doing such stuff in-kernel if at all possible, of
course.
Is it not possible to implement the fork/exec/exit notifications to
userspace so that a daemon can track the process relationships and perform
aggregation based upon individual tasks' accounting?  That's what one of
the accounting systems is proposing doing, I believe.

It's what I'm proposing. The problem is to be alerted when a new process
is created in order to add it in the correct group of processes if the
parent belongs to one (or several) groups. The notification can be done
with the fork connector patch. 
I am not quite comfortable of ELSA requesting a fork hook this way.
How many hooks in the stock kernel that are related to accounting? Can
anyone answer this question? I know of 'acct_process()' in exit.c used
by the BSD accounting and ELSA is requesting a hook in fork. If people
raise the same question again a few years later, how many people will
still remember this ELSA hook?
That was the reason i thought a central piece was a good idea. I
would rather see the fork hook is coded in acct.c and then invokes
a routine that handles what ELSA needs. If CSA would adopt the ELSA's
daemon's approach, CSA may also need to use the fork hook.
Actually the acct_process() was modified not long ago to become
a wrapper, which then invokes do_acct_process() which is completely
BSD specific. The fork hook can be the same.
- jay

(In fact, why do we even need the notifications?  /bin/ps can work this
stuff out).

Yes it can but the risk is to lose some forks no? 
I think that /bin/ps is using the /proc interface. If we're polling
the /proc to catch process creation we may lost some of them. With the
fork connector we catch all forks and we can check that by using the
sequence number (incremented by each fork) of the message.

Guillaume
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Jay Lan
Kaigai Kohei wrote:
Hello, everyone.
Andrew Morton wrote:
 > Jay Lan <[EMAIL PROTECTED]> wrote:
 >
 >>Since the need of Linux system accounting has gone beyond what BSD
 >>accounting provides, i think it is a good idea to create a thin layer
 >>of common code for various accounting packages, such as BSD accounting,
 >>CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke
 >>a routine in the common code which would then invoke those accounting
 >>packages that register to the acct_common to handle do_exit situation.
 >
 >
 > This all seems to be heading in the wrong direction.  Do we really 
want to
 > have lots of different system accounting packages all hooking into a
 > generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead
 > framework?
 >
 > Can't we get _one_ accounting system in there, get it right, avoid the
 > framework?

I think there are two issues about system accounting framework.
Issue: 1) How to define the appropriate unit for accounting ?
Current BSD-accountiong make a collection per process accounting 
information.
CSA make additionally a collection per process-aggregation accounting.
The 'enhanced acct data collection' patches that were added to
2-6-11-rc* tree still do collection of per process data.
CSA added those per-process data to per-aggregation ("job") data
structure at do_exit() time when a process termintes.
It is appropriate to make the fork-exit event handling framework for 
definition
of the process-aggregation, such as PAGG.

This system-accounting per process-aggregation is quite useful,
thought I tried the SGI's implementation named 'job' in past days.
Issue: 2) What items should be collected for accounting information ?
BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of
minor/major page faults. SGI's CSA collects VM/RSS size on exit time,
Integral-VM/RSS, and amount of block-I/O additionally.
These data are now collected in 2.6.11-rc* code. Note that these data
are still per-process.
I think it's hard to implement the accounting-engine as a kernel loadable
module using any kinds of framework. Because, we must put callback 
functions
into all around the kernel for this purpose.

Thus, I make a proposion as follows:
We should separate the process-aggregation functionality and collecting
accounting informations.
I totally agree with this! Actually that was what we have done. The data
collection part of code has been unified.
Something of framework to implement process-aggregation is necessary.
And, making a collection of accounting information should be merged
into BSD-accounting and implemented as a part of monolithic kernel
as Guillaume said.
This sounds good. I am interested in learning how ELSA saves off
the per-process accounting data before the data got disposed. If
that scheme works for CSA, we would be very happy to adopt the
scheme. The current BSD scheme is very insufficient. The code is
very BSD centric and it provides no way to handle process-aggregation.
Thanks,
 - jay
Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lse-tech] Re: A common layer for Accounting packages

2005-02-22 Thread Jay Lan
Guillaume Thouvenin wrote:
On Fri, 2005-02-18 at 17:16 -0800, Andrew Morton wrote:
Jay Lan <[EMAIL PROTECTED]> wrote:
Since the need of Linux system accounting has gone beyond what BSD
accounting provides, i think it is a good idea to create a thin layer
of common code for various accounting packages, such as BSD accounting,
CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke
a routine in the common code which would then invoke those accounting
packages that register to the acct_common to handle do_exit situation.
This all seems to be heading in the wrong direction.  Do we really want to
have lots of different system accounting packages all hooking into a
generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead
framework?
Can't we get _one_ accounting system in there, get it right, avoid the
framework?

  Is it possible to just merge the BSD accounting and the CSA accounting
by adding in the current BSD per-process accounting structure some
missing fields like the mm integral provided by the CSA patch?
Hi Guillaume,
All raw data CSA needs already stored in task_struct of the process.
ELSA is just a user of the accounting data. We need a hook in the
do_fork() routine to manage group of processes, not to do accounting.
I see at least three layers of functions in doing system accounting:
data collection, handling of the raw data, and presentation of the
data to users.
We have merged the data collection part. :)
Handling of the raw data seems done in ELSA by user spaced daemon
and you are proposing to add a hook at fork time. I am interested
in learning your approach. How ELSA adds per process accounting data
to your grouping (banks) when a process exit? How do you save
accounting data you need in task_struct before it is disposed? BSD
handles that through acct_process() hook at do_exit(). CSA also
depends on a hook at do_exit() to merge per-process data to per-job
data. How does ELSA handle this without a need of a do_exit() hook?
Thanks,
 - jay
Guillaume

---
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
___
Lse-tech mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lse-tech
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


A common layer for Accounting packages

2005-02-18 Thread Jay Lan
It started with the need of CSA to handle end-of-process (eop) at
do_exit() at exit.c. The hook at exit.c was BSD Accounting specific.
Since the need of Linux system accounting has gone beyond what BSD
accounting provides, i think it is a good idea to create a thin layer
of common code for various accounting packages, such as BSD accounting,
CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke
a routine in the common code which would then invoke those accounting
packages that register to the acct_common to handle do_exit situation.
Here is the description of this acct_common patch:
1) two new files at include/linux/acct_common.h and kernel/acct_common.c
2) A new config flag CONFIG_ACCT_COMMON is created and
   CONFIG_BSD_PROCESS_ACCT and a future CSA config flag depend on it.
   I think it is a good idea to always have acct_common in the kernel;
   in that case, the new config flag may not be really necessary. I can
   go either way.
3) Accounting packages can register themselves to acct_common for
   callbacks. Only do_exit handling is defined now. BSD acct.c has been
   modified to register/unergister to acct_common.
4) The 'enhanced acct data collection' routines have been moved from
   acct.c to acct_common.c. Files used to #include  were
   modified to #include .
This patch was generated against 2.6.11-rc3-mm2.
Signed-off-by: Jay Lan <[EMAIL PROTECTED]>
Index: linux/kernel/fork.c
===
--- linux.orig/kernel/fork.c2005-02-17 19:24:54.388927143 -0800
+++ linux/kernel/fork.c 2005-02-18 09:59:14.179816325 -0800
@@ -40,7 +40,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
Index: linux/fs/exec.c
===
--- linux.orig/fs/exec.c2005-02-18 05:45:19.868493728 -0800
+++ linux/fs/exec.c 2005-02-18 09:59:14.196418021 -0800
@@ -47,7 +47,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
Index: linux/kernel/exit.c
===
--- linux.orig/kernel/exit.c2005-02-17 19:24:54.380138011 -0800
+++ linux/kernel/exit.c 2005-02-18 10:05:43.506174767 -0800
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -814,7 +814,7 @@ fastcall NORET_TYPE void do_exit(long co
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
del_timer_sync(&tsk->signal->real_timer);
-   acct_process(code);
+   acct_do_exit(code, tsk);
}
exit_mm(tsk);
 
Index: linux/include/linux/acct_common.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux/include/linux/acct_common.h   2005-02-18 12:41:28.232626074 -0800
@@ -0,0 +1,83 @@
+/*
+ *  Copyright (c) 2005 Silicon Graphics, Inc.
+ *  All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Jay Lan <[EMAIL PROTECTED]>
+ */
+
+/*
+ *  include/linux/acct_common.h
+ *  
+ *  Common Accounting for Linux - Definitions
+ *
+ *  02/17/05 Jay Lan <[EMAIL PROTECTED]>. Initial creation of the header file.
+ *  This header file contains the definitions needed to support
+ *  registration/unregistration of various accounting agents to
+ *  the kernel. It also contains prototypes of common routines.
+ *
+ */
+
+#ifndef _LINUX_ACCT_COMMON_H
+#define _LINUX_ACCT_COMMON_H
+
+#include 
+#include 
+#include 
+
+
+/*
+ * Various accounting modules ID's
+ */
+#define ACCT_BSD   0
+#define ACCT_CSA   1
+#define ACCT_ELSA  2
+#define LAST_ACCT_ID   ACCT_ELSA
+#define ACCT_PKG_COUNT (LAST_ACCT_ID+1)
+
+/* 
+ * Used by accounting packages to define the callback functions into the
+ * packages.
+ *
+ * STRUCT MEMBERS:
+ *   pkg_id:   Accounting package ID. This should be set the package
+ * when register.
+ *   do_exit:  Function pointer to function used when do_exit()
+ * hook is needed.
+ * This is optional and may be set to NULL if it is
+ *

initialize_acct_integrals

2005-02-18 Thread Jay Lan
Hi Andrew,
The new "move-accounting-function-calls-out-of-critical-vm-code-paths"
patch in 2.6.11-rc3-mm2 was different from the code i tested.
In particular, it mistakenly dropped the accounting routine calls
in fs/exec.c. The calls in do_execve() are needed to properly
initialize accounting fields. Specifically, the tsk->acct_stimexpd
needs to be initialized to tsk->stime.
I have discussed this with Christoph Lameter and he gave me full
blessings to bring the calls back.
This initialize_acct_integrals patch was generated against
2.6.11-rc3-mm2 to fix the problem. Thanks!
Signed-off-by: Jay Lan <[EMAIL PROTECTED]>
Index: linux/fs/exec.c
===
--- linux.orig/fs/exec.c2005-02-17 19:24:45.785343748 -0800
+++ linux/fs/exec.c 2005-02-18 05:45:19.868493728 -0800
@@ -1193,6 +1193,8 @@ int do_execve(char * filename,
 
/* execve success */
security_bprm_free(bprm);
+   acct_update_integrals(current);
+   update_mem_hiwater(current);
kfree(bprm);
return retval;
}


Re: move-accounting-function-calls-out-of-critical-vm-code-paths.patch

2005-02-07 Thread Jay Lan
Andrew Morton wrote:
Christoph Lameter <[EMAIL PROTECTED]> wrote:
I hope that Roland's changes for higher resolution of cputime would
make that possible. But this is Jay's thing not mine. I just want to make
sure that the CSA patches does not get in the way of our attempts to
improve the performance of the page fault handler. In the discussions on
linux-mm there was also some concern about adding these calls.

Well your patch certainly cleans things up in there and would be a good
thing to have as long as we can be sure that it doesn't break the
accounting in some subtle way.
Which implies that we need to see some additional accounting code, so we
can verify that the base accumulation infrastructure is doing the expected
thing.  As well as an ack from the interested parties.  Does anyone know
what's happening with all the new accounting initiatives?  I'm seeing no
activity at all.
Sorry guys! I have been away for three weeks on short term disability.:(
I have tested Christoph's patch before the leave. It did work for CSA
and showed performance improvement on certain configuration.
CSA is currently implemented as a loadable module. I think ELSA is the
same, right? The use of the enhanced accounting data collection
code is not in the kernel tree. That was why Andrew did not see usage of 
the accounting patches. Should i propose to include the CSA module in
the kernel then, Andrew? :)

Cheers,
 - jay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/