Re: [PATCH] fs/jffs2: Add missing call to posix_acl_release
Julia Lawall wrote: From: Julia Lawall <[EMAIL PROTECTED]> posix_acl_clone does a memory allocation and sets a reference count, so posix_acl_release is needed afterwards to free it. The problem was fixed using the following semantic patch. (http://www.emn.fr/x-info/coccinelle/) // @@ type T; identifier E; expression E1, E2; int ret; statement S; @@ T E; <+... ( E = \(posix_acl_clone\|posix_acl_alloc\|posix_acl_dup\)(...); if (E == NULL) S | if ((E = \(posix_acl_clone\|posix_acl_alloc\|posix_acl_dup\)(...)) == NULL) S ) ... when != E2 = E when strict ( posix_acl_release(E); | E1 = E; | + posix_acl_release(E); return; | + posix_acl_release(E); return ret; ) ...+> // Signed-off-by: Julia Lawall <[EMAIL PROTECTED]> --- diff -u -p a/fs/jffs2/acl.c b/fs/jffs2/acl.c --- a/fs/jffs2/acl.c 2008-01-03 09:49:31.0 +0100 +++ b/fs/jffs2/acl.c 2008-01-06 17:38:52.0 +0100 @@ -345,8 +345,10 @@ int jffs2_init_acl_pre(struct inode *dir if (!clone) return -ENOMEM; rc = posix_acl_create_masq(clone, (mode_t *)i_mode); - if (rc < 0) + if (rc < 0) { + posix_acl_release(clone); return rc; + } if (rc > 0) jffs2_iset_acl(inode, &f->i_acl_access, clone); Indeed, there was a possibility to cause memory leaking. Acked-by: KaiGai Kohei <[EMAIL PROTECTED]> -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Exporting capability code/name pairs
James Morris wrote: > On Wed, 2 Jan 2008, KaiGai Kohei wrote: > >>> Another issue is that securityfs depends on CONFIG_SECURITY, which might be >>> undesirable, given that capabilities are a standard feature. >> We can implement this feature on another pseudo filesystems. >> Do you think what filesystem is the best candidate? >> I prefer procfs or sysfs instead. > > Sysfs makes more sense, as this information is system-wide and does not > relate to specific processes. The following patch enables to export any capabilities which are supported on the working kernel, under the /sys/kernel/capability. You can obtain the code/name pairs of capabilities with scanning this directory. Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]> -- kernel/Makefile |9 + kernel/capability.c | 36 scripts/mkcapnames.sh | 45 + 3 files changed, 90 insertions(+), 0 deletions(-) diff --git a/kernel/Makefile b/kernel/Makefile index dfa9695..29cd3ac 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -80,3 +80,12 @@ quiet_cmd_ikconfiggz = IKCFG $@ targets += config_data.h $(obj)/config_data.h: $(obj)/config_data.gz FORCE $(call if_changed,ikconfiggz) + +# cap_names.h contains the code/name pair of capabilities. +# It is generated using include/linux/capability.h automatically. +$(obj)/capability.o: $(obj)/cap_names.h +quiet_cmd_cap_names = CAPS$@ + cmd_cap_names = /bin/sh $(src)/../scripts/mkcapnames.sh > $@ +targets += cap_names.h +$(obj)/cap_names.h: $(src)/../scripts/mkcapnames.sh $(src)/../include/linux/capability.h FORCE + $(call if_changed,cap_names) diff --git a/kernel/capability.c b/kernel/capability.c index efbd9cd..14b4f4b 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -245,3 +245,39 @@ int capable(int cap) return __capable(current, cap); } EXPORT_SYMBOL(capable); + +/* + * Export the list of capabilities on /sys/kernel/capability + */ +#define SYSFS_CAPABILITY_ENTRY(_name, _fmt, ...) \ + static ssize_t _name##_show(struct kset *kset, char *buffer)\ + { \ + return scnprintf(buffer, PAGE_SIZE, _fmt, __VA_ARGS__); \ + } \ + static struct subsys_attribute _name##_attr = __ATTR_RO(_name) + +/* + * capability_attrs[] is generated automatically by scripts/mkcapnames.sh + * This script parses include/linux/capability.h + */ +#include "cap_names.h" + +static struct attribute_group capability_attr_group = { + .name = "capability", + .attrs = capability_attrs, +}; + +static int __init capability_export_names(void) +{ + int rc; + + rc = sysfs_create_group(&kernel_subsys.kobj, + &capability_attr_group); + if (rc) { + printk(KERN_ERR "Unable to export capabilities\n"); + return rc; + } + + return 0; +} +__initcall(capability_export_names); diff --git a/scripts/mkcapnames.sh b/scripts/mkcapnames.sh index e69de29..c1c8b1f 100644 --- a/scripts/mkcapnames.sh +++ b/scripts/mkcapnames.sh @@ -0,0 +1,45 @@ +#!/bin/sh + +# +# generate a cap_names.h file from include/linux/capability.h +# + +BASEDIR=`dirname $0` + +echo '#ifndef CAP_NAMES_H' +echo '#define CAP_NAMES_H' +echo +echo '#ifndef SYSFS_CAPABILITY_ENTRY' +echo '#error cap_names.h should be included from kernel/capability.c' +echo '#else' + +echo 'SYSFS_CAPABILITY_ENTRY(version, "0x%08x\n", _LINUX_CAPABILITY_VERSION);' + +cat ${BASEDIR}/../include/linux/capability.h \ +| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$' \ +| awk 'BEGIN { +max_code = -1; +} +{ +if ($3 > max_code) +max_code = $3; +printf("\tSYSFS_CAPABILITY_ENTRY(%s, \"%%u\\n\", %s);\n", tolower($2), $2); +} +END { +printf("\tSYSFS_CAPABILITY_ENTRY(index, \"%%u\\n\", %u);\n", max_code); +}' + +echo +echo 'static struct attribute *capability_attrs[] = {' +echo '&version_attr.attr,' +echo '&index_attr.attr,' + +cat ${BASEDIR}/../include/linux/capability.h\ +| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$' \ +| awk '{ printf ("&%s_attr.attr,\n", tolower($2)); }' + +echo 'NULL,' +echo '};' + +echo '#endif /* SYSFS_CAPABILITY_ENTRY */' +echo '#endif /* CAP_NAMES_H */' -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Exporting capability code/name pairs
> There is also the issue of compiled code which explicitly raises and > lowers capabilities around critical code sections (ie., as they were > intended to be used) is also not well served by this change. > > That is, unless the code was compiled with things like CAP_MAC_ADMIN > being #define'd then it won't be able to do things like cap_set_flag() > appropriately. A new function will be necessary, like: cap_value_t cap_get_value_by_name(const char *cap_name); If a program tries to use specific capabilities explicitly, it can apply pre-defined capability code like CAP_MAC_ADMIN. However, when we implement a program which can deal with arbitrary capabilities, it should obtain codes of them dynamically, because it enables to work itself on the feature kernel. Thanks, > Cheers > > Andrew > > KaiGai Kohei wrote: >> Andrew Morgan wrote: >>> -----BEGIN PGP SIGNED MESSAGE- >>> Hash: SHA1 >>> >>> KaiGai Kohei wrote: >>>> Remaining issues: >>>> - We have to mount securityfs explicitly, or use /etc/fstab. >>>> It can cause a matter when we want to use this feature on >>>> very early phase on boot. (like /sbin/init) >>> I'm not altogether clear how you intend this to work. >>> >>> Are you saying that some future version of libcap will require that >>> securityfs be mounted before it (libcap) will work? >> Yes, but implementing this feature on securityfs might be not good >> idea as as James said. If this feature is on procfs or sysfs, it is >> not necessary to mount securityfs explicitly. >> >> Thanks, > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.7 (Darwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHfD7n+bHCR3gb8jsRAsgcAKDY6qXBR3JV2addHUg5sxyZHJEkDQCfdLOL > zJlf3T4SQsUNENr3kwR5pr8= > =v8C5 > -END PGP SIGNATURE- > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Exporting capability code/name pairs
Andrew Morgan wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > KaiGai Kohei wrote: >> Remaining issues: >> - We have to mount securityfs explicitly, or use /etc/fstab. >> It can cause a matter when we want to use this feature on >> very early phase on boot. (like /sbin/init) > > I'm not altogether clear how you intend this to work. > > Are you saying that some future version of libcap will require that > securityfs be mounted before it (libcap) will work? Yes, but implementing this feature on securityfs might be not good idea as as James said. If this feature is on procfs or sysfs, it is not necessary to mount securityfs explicitly. Thanks, -- KaiGai Kohei <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Exporting capability code/name pairs
James Morris wrote: On Fri, 28 Dec 2007, KaiGai Kohei wrote: Remaining issues: - We have to mount securityfs explicitly, or use /etc/fstab. It can cause a matter when we want to use this feature on very early phase on boot. (like /sbin/init) Why can't early userspace itself mount securityfs? Hmm,,, It might be possible as load_policy() doing, if necessary. Please forget the previous my opinion. I'm not even sure this is a good idea at all. Existing capabilities will never disappear, and, as with syscalls, it's probably up to userland to handle new ones not existing. When we use libcap built on older kernel for newer kernel, libcap cannot handle newly added capabilities, because it is not exist on the build environment. Therefore, any available capabilities should be exported dynamically by the kernel. In any case, some more technical issues: kernel/cap_names.sh generates the body of cap_entries[] array, This needs to be in the scripts directory. OK, it will be moved. The generated header should be made idempotent (#ifdef wrapping), and also include a warning that it is automatically generated (identifying the script which does so), and that is should not be edited. + d_caps = securityfs_create_dir("capability", NULL); + if (!d_caps) Wrong way to check for error -- the function returns an ERR_PTR(). + f_caps[i] = securityfs_create_file(cap_entries[i].name, 0444, + d_caps, &cap_entries[i], + &cap_entry_fops); + if (!f_caps[i]) Ditto. OK, Another issue is that securityfs depends on CONFIG_SECURITY, which might be undesirable, given that capabilities are a standard feature. We can implement this feature on another pseudo filesystems. Do you think what filesystem is the best candidate? I prefer procfs or sysfs instead. Thanks, -- KaiGai Kohei <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Exporting capability code/name pairs
James Morris wrote: On Fri, 28 Dec 2007, KaiGai Kohei wrote: + snprintf(tmp, sizeof(tmp), +cap_entry == &cap_entries[0] ? "0x%08x" : "%u", +cap_entry->code); + len = strlen(tmp); You don't need to call strlen(), just use scnprintf() and grab the return value. Thanks for your suggestion, I'll fix it on the next posting. -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Exporting capability code/name pairs
The attached patch enables to export capability code/name pairs under /capability of securityfs (revision 2). Inprovements from the first revison: - simple_read_from_buffer() is used for read method. - cap_entries[] array is generated from include/linux/capability.h automatically. Remaining issues: - We have to mount securityfs explicitly, or use /etc/fstab. It can cause a matter when we want to use this feature on very early phase on boot. (like /sbin/init) It was also concerned at the past. http://marc.info/?l=linux-kernel&m=112063963623190&w=2 How do you think an idea that the root of securityfs is mounted on kernel/security of sysfs when it is initialized? If securityfs got being available when kernel initializing process, we can always use this feature and the matter will be resolved. >>> +static struct cap_entry_data cap_entries[] = { >>> +/* max number of supported format */ >>> +{ _LINUX_CAPABILITY_VERSION,"version" }, >>> +/* max number of capability */ >>> +{ CAP_LAST_CAP,"index" }, >>> +/* list of capabilities */ >>> +{ CAP_CHOWN,"cap_chown" }, >>> +{ CAP_DAC_OVERRIDE,"cap_dac_override" }, > - snip - >>> +{ CAP_MAC_OVERRIDE,"cap_mac_override" }, >>> +{ CAP_MAC_ADMIN,"cap_mac_admin" }, >>> +{ -1,NULL}, >>> +}; >> >> I don't like this duplication with the list in >> include/linux/capability.h. >> Now when a new cap is added, it needs to be >> >> 1. added to capability.h >> 2. swapped as the new CAP_LAST_CAP in capability.h >> 3. added to this list... >> >> Could you integrate the two lists (not sure how offhand), or at least >> put them in the same place? kernel/cap_names.sh generates the body of cap_entries[] array, and it is invoked when we make the kernel. Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]> --- Makefile |9 +++ cap_names.sh | 21 capability.c | 75 +++ 3 files changed, 105 insertions(+) diff --git a/kernel/Makefile b/kernel/Makefile index dfa9695..45d6034 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -80,3 +80,12 @@ quiet_cmd_ikconfiggz = IKCFG $@ targets += config_data.h $(obj)/config_data.h: $(obj)/config_data.gz FORCE $(call if_changed,ikconfiggz) + +# cap_names.h contains the code/name pair of capabilities. +# It is generated using include/linux/capability.h automatically. +$(obj)/capability.o: $(obj)/cap_names.h +quiet_cmd_cap_names = CAPS$@ + cmd_cap_names = /bin/sh $(src)/cap_names.sh > $@ +targets += cap_names.h +$(obj)/cap_names.h: $(src)/cap_names.sh $(src)/../include/linux/capability.h FORCE + $(call if_changed,cap_names) diff --git a/kernel/cap_names.sh b/kernel/cap_names.sh index e69de29..7b2fcfe 100644 --- a/kernel/cap_names.sh +++ b/kernel/cap_names.sh @@ -0,0 +1,21 @@ +#!/bin/sh + +# +# generate a cap_names.h file from include/linux/capability.h +# + +BASEDIR=`dirname $0` + +cat ${BASEDIR}/../include/linux/capability.h \ +| egrep '^#define CAP_[A-Z_]+[ ]+[0-9]+$' \ +| awk 'BEGIN { + max_code = -1; + } + { + if ($3 > max_code) + max_code = $3; + printf("\t{ %s, \"%s\" },\n", $2, tolower($2)); + } + END { + printf("\t{ %u, \"index\" },\n", max_code); + }' diff --git a/kernel/capability.c b/kernel/capability.c index efbd9cd..03a9b62 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -245,3 +245,78 @@ int capable(int cap) return __capable(current, cap); } EXPORT_SYMBOL(capable); + +/* + * capability code/name pairs are exported under /sys/security/capability/ + */ +struct cap_entry_data { + unsigned int code; + const char *name; +}; + +static struct cap_entry_data cap_entries[] = { + /* max number of supported format */ + { _LINUX_CAPABILITY_VERSION, "version" }, + /* list of capabilities */ +#include "cap_names.h" + { -1, NULL}, +}; + +static ssize_t cap_entry_read(struct file *file, char __user *buffer, + size_t count, loff_t *ppos) +{ + struct cap_entry_data *cap_entry; + char tmp[32]; + int len; + + cap_entry = file->f_dentry->d_inode->i_private; + BUG_ON(!cap_entry); + + snprintf(tmp, sizeof(tmp), +cap_entry == &cap_entries[0] ? "0x%08x" : "%u", +cap_entry->code); + len = strlen(tmp); + + return simple_r
Re: [PATCH] Exporting capability code/name pairs
Serge E. Hallyn wrote: Quoting KaiGai Kohei ([EMAIL PROTECTED]): This patch enables to export the code/name pairs of capabilities under /capability of securityfs. In the current libcap, it obtains the list of capabilities from header file on the build environment statically. However, it is not enough portable between different versions of kernels, because an already built libcap cannot have knowledge about new added capabilities. Dynamic collection of code/name pairs of capabilities will resolve this matter. But it is not perfect one. I have a bit concern about this patch now. 1. I want to generate cap_entries array from linux/capability.h automatically. Is there any good idea? 2. We have to mount securityfs explicitly, or using /etc/fstab. It can make a matter when we want to use this features in very early boot sequence. Any comment please. I like the idea, but - snip - +/* + * capability code/name pairs are exported under /sys/security/capability/ + */ +struct cap_entry_data { + unsigned int code; + const char *name; +}; + +static struct cap_entry_data cap_entries[] = { + /* max number of supported format */ + { _LINUX_CAPABILITY_VERSION,"version" }, + /* max number of capability */ + { CAP_LAST_CAP, "index" }, + /* list of capabilities */ + { CAP_CHOWN,"cap_chown" }, + { CAP_DAC_OVERRIDE, "cap_dac_override" }, - snip - + { CAP_MAC_OVERRIDE, "cap_mac_override" }, + { CAP_MAC_ADMIN,"cap_mac_admin" }, + { -1, NULL}, +}; I don't like this duplication with the list in include/linux/capability.h. Now when a new cap is added, it needs to be 1. added to capability.h 2. swapped as the new CAP_LAST_CAP in capability.h 3. added to this list... Could you integrate the two lists (not sure how offhand), or at least put them in the same place? The following script will generate cap_entries[] array. cat include/linux/capability.h \ | egrep '^#define[ \t]+CAP_[A-Z_]+[ \t]+[0-9]+$' \ | awk '{ printf("\t{ %s, \"\" },\n", $2, tolower($2)); }' It is nice to include the result of this script, like as: static struct cap_entry_data cap_entries[] = { /* max number of supported format */ { _LINUX_CAPABILITY_VERSION,"version" }, /* max number of capability */ { CAP_LAST_CAP, "index" }, #include "capability_names.h" { -1, NULL }, }; I guess we can put this script on Makefile like as kernel/timeconst.pl doing. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Exporting capability code/name pairs
This patch enables to export the code/name pairs of capabilities under /capability of securityfs. In the current libcap, it obtains the list of capabilities from header file on the build environment statically. However, it is not enough portable between different versions of kernels, because an already built libcap cannot have knowledge about new added capabilities. Dynamic collection of code/name pairs of capabilities will resolve this matter. But it is not perfect one. I have a bit concern about this patch now. 1. I want to generate cap_entries array from linux/capability.h automatically. Is there any good idea? 2. We have to mount securityfs explicitly, or using /etc/fstab. It can make a matter when we want to use this features in very early boot sequence. Any comment please. usage: --- # mount -t securityfs none /sys/kernel/security # cd /sys/kernel/security/capability # ls cap_audit_controlcap_kill cap_setpcap cap_sys_rawio cap_audit_write cap_lease cap_setuid cap_sys_resource cap_chowncap_linux_immutable cap_sys_admin cap_sys_time cap_dac_override cap_mknod cap_sys_bootcap_sys_tty_config cap_dac_read_search cap_net_admin cap_sys_chroot index cap_fowner cap_net_bind_service cap_sys_module version cap_fsetid cap_net_broadcast cap_sys_nice cap_ipc_lock cap_setfcap cap_sys_pacct cap_ipc_ownercap_setgidcap_sys_ptrace # cat cap_audit_write ; echo 29 # cat cap_sys_chroot ; echo 18 # cat version ; echo 0x19980330 # cat index; echo 31 # -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]> --- capability.c | 127 +++ 1 file changed, 127 insertions(+) diff --git a/kernel/capability.c b/kernel/capability.c index efbd9cd..5d9bf53 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -245,3 +245,131 @@ int capable(int cap) return __capable(current, cap); } EXPORT_SYMBOL(capable); + +/* + * Capability code/name pair exporting + */ + +/* + * capability code/name pairs are exported under /sys/security/capability/ + */ +struct cap_entry_data { + unsigned int code; + const char *name; +}; + +static struct cap_entry_data cap_entries[] = { + /* max number of supported format */ + { _LINUX_CAPABILITY_VERSION,"version" }, + /* max number of capability */ + { CAP_LAST_CAP, "index" }, + /* list of capabilities */ + { CAP_CHOWN,"cap_chown" }, + { CAP_DAC_OVERRIDE, "cap_dac_override" }, + { CAP_DAC_READ_SEARCH, "cap_dac_read_search" }, + { CAP_FOWNER, "cap_fowner" }, + { CAP_FSETID, "cap_fsetid" }, + { CAP_KILL, "cap_kill" }, + { CAP_SETGID, "cap_setgid" }, + { CAP_SETUID, "cap_setuid" }, + { CAP_SETPCAP, "cap_setpcap" }, + { CAP_LINUX_IMMUTABLE, "cap_linux_immutable" }, + { CAP_NET_BIND_SERVICE, "cap_net_bind_service" }, + { CAP_NET_BROADCAST,"cap_net_broadcast" }, + { CAP_NET_ADMIN,"cap_net_admin" }, + { CAP_NET_RAW, "cap_net_admin" }, + { CAP_IPC_LOCK, "cap_ipc_lock" }, + { CAP_IPC_OWNER,"cap_ipc_owner" }, + { CAP_SYS_MODULE, "cap_sys_module" }, + { CAP_SYS_RAWIO,"cap_sys_rawio" }, + { CAP_SYS_CHROOT, "cap_sys_chroot" }, + { CAP_SYS_PTRACE, "cap_sys_ptrace" }, + { CAP_SYS_PACCT,"cap_sys_pacct" }, + { CAP_SYS_ADMIN,"cap_sys_admin" }, + { CAP_SYS_BOOT, "cap_sys_boot" }, + { CAP_SYS_NICE, "cap_sys_nice" }, + { CAP_SYS_RESOURCE, "cap_sys_resource" }, + { CAP_SYS_TIME, "cap_sys_time" }, + { CAP_SYS_TTY_CONFIG, "cap_sys_tty_config" }, + { CAP_MKNOD,"cap_mknod" }, + { CAP_LEASE,"cap_lease" }, + { CAP_AUDIT_WRITE, "cap_audit_write" }, + { CAP_AUDIT_CONTROL,"cap_audit_control" }, + { CAP_SETFCAP, "cap_setfcap" }, + { CAP_MAC_OVERRIDE, "cap_mac_override" }, + { CAP_MAC_ADM
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
Sorry, any TABs are replaced by MUA. I'll send the patch again. > The attached patch provides several improvement for pam_cap module. > 1. It enables pam_cap to drop capabilities from process'es capability >bounding set. > 2. It enables to specify allowing inheritable capability set or dropping >bounding capability set for groups, not only users. > 3. It provide pam_sm_session() method, not only pam_sm_authenticate() >and pam_sm_setcred(). A system administrator can select more >appropriate mode for his purpose. > 4. In the auth/cred mode, it enables to cache the configuration file, >to avoid read and analyze it twice. > (Therefore, most of the part in the original one got replaced) > > The default configuration file is "/etc/security/capability.conf". > You can describe as follows: > > # kaigai get cap_net_raw and cap_kill, tak get cap_sys_pacct pI. > # We can omit "i:" in the head of each line. > i:cap_net_raw,cap_kill kaigai > cap_sys_pacct tak > > # ymj and tak lost cap_sys_chroot from cap_bset > b:cap_sys_chroot ymj tak > > # Any user within webadm group get cap_net_bind_service pI. > i:cap_net_bind_service @webadm > > # Any user within users group lost cap_sys_module from cap_bset > b:cap_sys_module @users > > > When a user or groups he belongs is on several lines, all configurations > are simplly compounded. > > In the above example, if tak belongs to webadm and users group, > he will get cap_sys_pacct and cap_net_bind_service pI, and lost > cap_sys_chroot and cap_sys_module from his cap_bset. > > Thanks, Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]> -- pam_cap/capability.conf |6 + pam_cap/pam_cap.c | 495 --- 2 files changed, 305 insertions(+), 196 deletions(-) diff --git a/pam_cap/capability.conf b/pam_cap/capability.conf index b543142..707cdc3 100644 --- a/pam_cap/capability.conf +++ b/pam_cap/capability.conf @@ -24,6 +24,12 @@ cap_setfcap morgan ## 'everyone else' gets no inheritable capabilities none * +# user 'kaigai' lost CAP_NET_RAW capability from bounding set +b:cap_net_raw kaigai + +# group 'acctadm' get CAP_SYS_PACCT inheritable capability +i:cap_sys_pacct @acctadm + ## if there is no '*' entry, all users not explicitly mentioned will ## get all available capabilities. This is a permissive default, and ## probably not what you want... diff --git a/pam_cap/pam_cap.c b/pam_cap/pam_cap.c index 94c5ebc..a917d5c 100644 --- a/pam_cap/pam_cap.c +++ b/pam_cap/pam_cap.c @@ -1,5 +1,6 @@ /* * Copyright (c) 1999,2007 Andrew G. Morgan <[EMAIL PROTECTED]> + * Copyright (c) 2007 KaiGai Kohei <[EMAIL PROTECTED]> * * The purpose of this module is to enforce inheritable capability sets * for a specified user. @@ -13,298 +14,400 @@ #include #include #include +#include +#include #include +#include #include #include +#define MODULE_NAME"pam_cap" #define USER_CAP_FILE "/etc/security/capability.conf" #define CAP_FILE_BUFFER_SIZE4096 #define CAP_FILE_DELIMITERS " \t\n" -#define CAP_COMBINED_FORMAT "%s all-i %s+i" -#define CAP_DROP_ALL"%s all-i" + +#ifndef PR_CAPBSET_DROP +#define PR_CAPBSET_DROP24 +#endif + +extern char const *_cap_names[]; struct pam_cap_s { int debug; const char *user; const char *conf_filename; +/* set in read_capabilities_for_user() */ +cap_t result; +int do_set_inh : 1; +int do_set_bset : 1; }; -/* obtain the inheritable capabilities for the current user */ - -static char *read_capabilities_for_user(const char *user, const char *source) +/* obtain the inheritable/bounding capabilities for the current user */ +static int read_capabilities_for_user(struct pam_cap_s *pcs) { -char *cap_string = NULL; -char buffer[CAP_FILE_BUFFER_SIZE], *line; +char buffer[CAP_FILE_BUFFER_SIZE]; FILE *cap_file; +struct passwd *pwd; +int line_num = 0; +int rc = -1; /* PAM_(AUTH|CRED|SESSION)_ERR */ + +pwd = getpwnam(pcs->user); +if (!pwd) { + syslog(LOG_ERR, "user %s not in passwd entries", pcs->user); + return PAM_AUTH_ERR; +} -cap_file = fopen(source, "r"); -if (cap_file == NULL) { - D(("failed to open capability file")); - return NULL; +cap_file = fopen(pcs->conf_filename, "r"); +if (!cap_file) { + if (errno == ENOENT) { + syslog(LOG_NOTICE, "%s is not found", + pcs->conf_filename); + return PAM_IGNORE; + } else { +
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
Andrew Morgan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 KaiGai Kohei wrote: BTW, could you tell me your intention about pam_cap.c is implemented with pam_sm_authenticate() and pam_sm_setcred()? I think it can be done with pam_sm_open_session(), and this approach enables to reduce the iteration of reading /etc/security/capability.conf. How do you think the idea? Good question! If you want to add session support you can. I'd prefer it if you retained support for the auth/cred API too: admin choice and all that. To remove the second read of the file, you can use a PAM data item to cache the desired capability info after the first read of the file. I added session API support with retaining auth/cred API, and PAM data item caching feature using pam_(get|set)_data. It enables to kill the seconf read of the configuration file. Thanks for your suggestion. Followings are detailed explanation and the patch. I implemented it as a credential module (which has to get the authentication return code right to make the credential stack execute correctly) because I think of capabilities as credentials. That being said, the credentials vs. session thing is not well delineated by many applications, so it is arguably useful to provide both interfaces for the admin to make use of on a per application basis. The attached patch provides several improvement for pam_cap module. 1. It enables pam_cap to drop capabilities from process'es capability bounding set. 2. It enables to specify allowing inheritable capability set or dropping bounding capability set for groups, not only users. 3. It provide pam_sm_session() method, not only pam_sm_authenticate() and pam_sm_setcred(). A system administrator can select more appropriate mode for his purpose. 4. In the auth/cred mode, it enables to cache the configuration file, to avoid read and analyze it twice. (Therefore, most of the part in the original one got replaced) The default configuration file is "/etc/security/capability.conf". You can describe as follows: # kaigai get cap_net_raw and cap_kill, tak get cap_sys_pacct pI. # We can omit "i:" in the head of each line. i:cap_net_raw,cap_kill kaigai cap_sys_pacct tak # ymj and tak lost cap_sys_chroot from cap_bset b:cap_sys_chroot ymj tak # Any user within webadm group get cap_net_bind_service pI. i:cap_net_bind_service @webadm # Any user within users group lost cap_sys_module from cap_bset b:cap_sys_module @users When a user or groups he belongs is on several lines, all configurations are simplly compounded. In the above example, if tak belongs to webadm and users group, he will get cap_sys_pacct and cap_net_bind_service pI, and lost cap_sys_chroot and cap_sys_module from his cap_bset. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]> -- pam_cap/capability.conf |6 + pam_cap/pam_cap.c | 495 --- 2 files changed, 305 insertions(+), 196 deletions(-) diff --git a/pam_cap/capability.conf b/pam_cap/capability.conf index b543142..707cdc3 100644 --- a/pam_cap/capability.conf +++ b/pam_cap/capability.conf @@ -24,6 +24,12 @@ cap_setfcap morgan ## 'everyone else' gets no inheritable capabilities none * +# user 'kaigai' lost CAP_NET_RAW capability from bounding set +b:cap_net_raw kaigai + +# group 'acctadm' get CAP_SYS_PACCT inheritable capability +i:cap_sys_pacct @acctadm + ## if there is no '*' entry, all users not explicitly mentioned will ## get all available capabilities. This is a permissive default, and ## probably not what you want... diff --git a/pam_cap/pam_cap.c b/pam_cap/pam_cap.c index 94c5ebc..a917d5c 100644 --- a/pam_cap/pam_cap.c +++ b/pam_cap/pam_cap.c @@ -1,5 +1,6 @@ /* * Copyright (c) 1999,2007 Andrew G. Morgan <[EMAIL PROTECTED]> + * Copyright (c) 2007 KaiGai Kohei <[EMAIL PROTECTED]> * * The purpose of this module is to enforce inheritable capability sets * for a specified user. @@ -13,298 +14,400 @@ #include #include #include +#include +#include #include +#include #include #include +#define MODULE_NAME"pam_cap" #define USER_CAP_FILE "/etc/security/capability.conf" #define CAP_FILE_BUFFER_SIZE4096 #define CAP_FILE_DELIMITERS " \t\n" -#define CAP_COMBINED_FORMAT "%s all-i %s+i" -#define CAP_DROP_ALL"%s all-i" + +#ifndef PR_CAPBSET_DROP +#define PR_CAPBSET_DROP24 +#endif + +extern char const *_cap_names[]; struct pam_cap_s { int debug; const char *user; const char *conf_filename; +/* set in read_capabilities_for_user() */ +cap_
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
Andrew Morgan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 KaiGai Kohei wrote: Andrew Morgan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 KaiGai Kohei wrote: +if (!!cap_issubset(*inheritable, + cap_combine(target->cap_inheritable, + current->cap_bset))) { +/* no new pI capabilities outside bounding set */ +return -EPERM; +} Yes, the !! was a bug. The correct check is a single !. I was in trouble with getting -EPERM at pam_cap.so :-) (Thus, the correct check says no 'new' pI bits can be outside cap_bset.) If this condition intends to dominate 'new' pI bits by 'old' pI bits masked with bounding set, we should not apply cap_combine() here. I think applying cap_intersect() is correct for the purpose. The check is not meant to limit existing pI bits. The check is meant to limit what new bits can be 'added' to pI (in the case that pE & CAP_SETPCAP is true). Thanks, I got understood as I wrote in the previous reply. BTW, could you tell me your intention about pam_cap.c is implemented with pam_sm_authenticate() and pam_sm_setcred()? I think it can be done with pam_sm_open_session(), and this approach enables to reduce the iteration of reading /etc/security/capability.conf. How do you think the idea? -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
(Thus, the correct check says no 'new' pI bits can be outside cap_bset.) If this condition intends to dominate 'new' pI bits by 'old' pI bits masked with bounding set, we should not apply cap_combine() here. I think applying cap_intersect() is correct for the purpose. That would have been my first inclination, but Andrew actually wanted to be able to keep a pI with bits not in the capability bounding set. And it's really not a big problem, since 1. you can never grow cap_bset 2. the capbound.c program just makes sure to call capset to take the bit being removed from cap_bset out of pI' 3. It could be advantageous for some daemon to keep a bit in pI which can never be gained through fP but can be gained by a child through (fI&pI). Does that seem reasonable to you? OK, I got understood the intention of the condition. It seems to me reasonable policy. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> This patch fixes incorrect condition added by per-process capability bounding set patch. It intends to limit no new pI capabilities outside bounding set. Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]> commoncap.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- linux-2.6.24-rc3/security/commoncap.c.old 2007-12-06 10:51:48.0 +0900 +++ linux-2.6.24-rc3/security/commoncap.c 2007-12-06 10:52:15.0 +0900 @@ -119,9 +119,9 @@ int cap_capset_check (struct task_struct /* incapable of using this inheritable set */ return -EPERM; } - if (!!cap_issubset(*inheritable, - cap_combine(target->cap_inheritable, - current->cap_bset))) { + if (!cap_issubset(*inheritable, + cap_combine(target->cap_inheritable, + current->cap_bset))) { /* no new pI capabilities outside bounding set */ return -EPERM; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
Andrew Morgan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 KaiGai Kohei wrote: Serge, Please tell me the meanings of the following condition. diff --git a/security/commoncap.c b/security/commoncap.c index 3a95990..cb71bb0 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct *target, kernel_cap_t *effective, /* incapable of using this inheritable set */ return -EPERM; } +if (!!cap_issubset(*inheritable, + cap_combine(target->cap_inheritable, + current->cap_bset))) { +/* no new pI capabilities outside bounding set */ +return -EPERM; +} /* verify restrictions on target's new Permitted set */ if (!cap_issubset (*permitted, It seems to me this condition requires the new inheritable capability set must have a capability more than bounding set, at least. What is the purpose of this checking? Yes, the !! was a bug. The correct check is a single !. I was in trouble with getting -EPERM at pam_cap.so :-) (Thus, the correct check says no 'new' pI bits can be outside cap_bset.) If this condition intends to dominate 'new' pI bits by 'old' pI bits masked with bounding set, we should not apply cap_combine() here. I think applying cap_intersect() is correct for the purpose. Thanks, -- KaiGai Kohei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
Serge, Please tell me the meanings of the following condition. diff --git a/security/commoncap.c b/security/commoncap.c index 3a95990..cb71bb0 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct *target, kernel_cap_t *effective, /* incapable of using this inheritable set */ return -EPERM; } + if (!!cap_issubset(*inheritable, + cap_combine(target->cap_inheritable, + current->cap_bset))) { + /* no new pI capabilities outside bounding set */ + return -EPERM; + } /* verify restrictions on target's new Permitted set */ if (!cap_issubset (*permitted, It seems to me this condition requires the new inheritable capability set must have a capability more than bounding set, at least. What is the purpose of this checking? In the initial state, any process have no inheritable capability set and full bounding set. Thus, we cannot do capset() always. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
Andrew Morgan wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 KaiGai Kohei wrote: There is already a pam_cap module in the libcap2 package. Can we merge this functionality? I think it is a good idea. However, this module already have a feature to modify inheritable capability set. How does it to be described in the "/etc/security/capability.conf"? One idea is like a following convention: # compatible configuration. We can omit "i:" at the head of line cap_setfcap tak # It drops any capabilities from b-set except for cap_net_raw and cap_fowner b:cap_net_raw,cap_fownerymj # It drops only cap_dac_override from b-set. b:-cap_dac_override kaigai # It drops only cap_sys_admin from b-set of any user within users group. b:-cap_sys_admingroup:users I like the idea of a separate line for bounds. For ease of parsing, perhaps '!' or some other symbol prefix to the line could be used to identify lines that refer to cap_bound? In other modules, @groupname is used to capture a group association. Lines like this should be supported: !cap_net_raw @regularusers# suppress from cap_bset cap_net_raw @pingers morgan # add to pI where morgan is not in group @pingers but is in group @regularusers. The "@groupname" is intuitive convention. I also think it is good idea. But !cap_xxx is a bit misunderstandable for me. Someone may misunderstand this line means any capabilities except for cap_xxx. Thus, I think that using "b:" and omittable "i:" prefix is better than "!". In addition, what is your opinion about using "-b:" and "-i:" to represent dropping capabilities currently they have? There is one more uncertain case. When a user belongs to several groups with capabilities configuration, what capabilities are to be attached for the user? e.g) When kaigai belong to @pingers and @paccters b:cap_sys_pacct @paccters b:cap_net_raw @pingers -b:cap_dac_override,cap_net_raw kaigai If we apply "OR" policy, kaigai get only cap_sys_pacct, because he got cap_sys_pacct and cap_net_raw came from @paccters and @pingers but cap_dac_override and cap_net_raw are dropped by the third line. Thanks, Cheers Andrew Thanks, Cheers Andrew [EMAIL PROTECTED] wrote: Quoting KaiGai Kohei ([EMAIL PROTECTED]): Serge E. Hallyn wrote: The capability bounding set is a set beyond which capabilities cannot grow. Currently cap_bset is per-system. It can be manipulated through sysctl, but only init can add capabilities. Root can remove capabilities. By default it includes all caps except CAP_SETPCAP. Serge, This feature makes me being interested in. I think you intend to apply this feature for the primary process of security container. However, it is also worthwhile to apply when a session is starting up. The following PAM module enables to drop capability bounding bit specified by the fifth field in /etc/passwd entry. This code is just an example now, but considerable feature. build and install: # gcc -Wall -c pam_cap_drop.c # gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam # cp pam_cap_drop.so /lib/security modify /etc/passwd as follows: tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash ^^ example: [EMAIL PROTECTED] ~]$ ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms --- 192.168.1.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms [EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED] [EMAIL PROTECTED]'s password: Last login: Sat Dec 1 10:09:29 2007 from masu.myhome.cx [EMAIL PROTECTED] ~]$ export LANG=C [EMAIL PROTECTED] ~]$ ping 192.168.1.1 ping: icmp open socket: Operation not permitted [EMAIL PROTECTED] ~]$ su Password: pam_cap_bset[6921]: user root does not have 'cap_drop=' property [EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap CapInh: CapPrm: dffe CapEff: dffe [EMAIL PROTECTED] tak]# Neat. A bigger-stick version of not adding the account to group wheel. I'll use that. Is there any reason not to have a separate /etc/login.capbounds config file, though, so the account can still have a full name? Did you only use that for convenience of proof of concept, or is there another reason? # BTW, I replaced the James's address in the Cc: list, # because MTA does not accept it. Thanks! I don't know what happened to my alias for him... thanks, -serge -- KaiGai Kohei <[EMAIL PROTECTED]> pam_cap_drop.c ******** /* * pam_cap_drop.c mod
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
There is already a pam_cap module in the libcap2 package. Can we merge this functionality? I think it is a good idea. However, this module already have a feature to modify inheritable capability set. How does it to be described in the "/etc/security/capability.conf"? One idea is like a following convention: # compatible configuration. We can omit "i:" at the head of line cap_setfcap tak # It drops any capabilities from b-set except for cap_net_raw and cap_fowner b:cap_net_raw,cap_fownerymj # It drops only cap_dac_override from b-set. b:-cap_dac_override kaigai # It drops only cap_sys_admin from b-set of any user within users group. b:-cap_sys_admingroup:users Thanks, Cheers Andrew [EMAIL PROTECTED] wrote: Quoting KaiGai Kohei ([EMAIL PROTECTED]): Serge E. Hallyn wrote: The capability bounding set is a set beyond which capabilities cannot grow. Currently cap_bset is per-system. It can be manipulated through sysctl, but only init can add capabilities. Root can remove capabilities. By default it includes all caps except CAP_SETPCAP. Serge, This feature makes me being interested in. I think you intend to apply this feature for the primary process of security container. However, it is also worthwhile to apply when a session is starting up. The following PAM module enables to drop capability bounding bit specified by the fifth field in /etc/passwd entry. This code is just an example now, but considerable feature. build and install: # gcc -Wall -c pam_cap_drop.c # gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam # cp pam_cap_drop.so /lib/security modify /etc/passwd as follows: tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash ^^ example: [EMAIL PROTECTED] ~]$ ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms --- 192.168.1.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms [EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED] [EMAIL PROTECTED]'s password: Last login: Sat Dec 1 10:09:29 2007 from masu.myhome.cx [EMAIL PROTECTED] ~]$ export LANG=C [EMAIL PROTECTED] ~]$ ping 192.168.1.1 ping: icmp open socket: Operation not permitted [EMAIL PROTECTED] ~]$ su Password: pam_cap_bset[6921]: user root does not have 'cap_drop=' property [EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap CapInh: CapPrm: dffe CapEff: dffe [EMAIL PROTECTED] tak]# Neat. A bigger-stick version of not adding the account to group wheel. I'll use that. Is there any reason not to have a separate /etc/login.capbounds config file, though, so the account can still have a full name? Did you only use that for convenience of proof of concept, or is there another reason? # BTW, I replaced the James's address in the Cc: list, # because MTA does not accept it. Thanks! I don't know what happened to my alias for him... thanks, -serge -- KaiGai Kohei <[EMAIL PROTECTED]> pam_cap_drop.c /* * pam_cap_drop.c module -- drop capabilities bounding set * * Copyright: 2007 KaiGai Kohei <[EMAIL PROTECTED]> */ #include #include #include #include #include #include #include #include #include #ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif static char *captable[] = { "cap_chown", "cap_dac_override", "cap_dac_read_search", "cap_fowner", "cap_fsetid", "cap_kill", "cap_setgid", "cap_setuid", "cap_setpcap", "cap_linux_immutable", "cap_net_bind_service", "cap_net_broadcast", "cap_net_admin", "cap_net_raw", "cap_ipc_lock", "cap_ipc_owner", "cap_sys_module", "cap_sys_rawio", "cap_sys_chroot", "cap_sys_ptrace", "cap_sys_pacct", "cap_sys_admin", "cap_sys_boot", "cap_sys_nice", "cap_sys_resource", "cap_sys_time", "cap_sys_tty_config", "cap_mknod", "cap_lease", "cap_audit_write", "cap_audit_control", "cap_setfcap", NULL, }; PAM_EXTERN int pam_sm_open_session(pam_handle_t *pamh, int flags, int argc, const char **argv) { struct passwd *pwd; char
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
Serge, Is there any reason not to have a separate /etc/login.capbounds config file, though, so the account can still have a full name? Did you only use that for convenience of proof of concept, or is there another reason? passwd(5) says the fifth field is optional and only used for informational purpose (like ulimit, umask). However, using any other separate config file is conservative and better. One candidate is "/etc/security/capability.conf" defined as the config file of pam_cap. Thanks, -- KaiGai Kohei <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)
Serge E. Hallyn wrote: > The capability bounding set is a set beyond which capabilities > cannot grow. Currently cap_bset is per-system. It can be > manipulated through sysctl, but only init can add capabilities. > Root can remove capabilities. By default it includes all caps > except CAP_SETPCAP. Serge, This feature makes me being interested in. I think you intend to apply this feature for the primary process of security container. However, it is also worthwhile to apply when a session is starting up. The following PAM module enables to drop capability bounding bit specified by the fifth field in /etc/passwd entry. This code is just an example now, but considerable feature. build and install: # gcc -Wall -c pam_cap_drop.c # gcc -Wall -shared -Xlinker -x -o pam_cap_drop.so pam_cap_drop.o -lpam # cp pam_cap_drop.so /lib/security modify /etc/passwd as follows: tak:x:1004:100:cap_drop=cap_net_raw,cap_chown:/home/tak:/bin/bash ^^ example: [EMAIL PROTECTED] ~]$ ping 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.23 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.02 ms --- 192.168.1.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 1.023/1.130/1.237/0.107 ms [EMAIL PROTECTED] ~]$ ssh [EMAIL PROTECTED] [EMAIL PROTECTED]'s password: Last login: Sat Dec 1 10:09:29 2007 from masu.myhome.cx [EMAIL PROTECTED] ~]$ export LANG=C [EMAIL PROTECTED] ~]$ ping 192.168.1.1 ping: icmp open socket: Operation not permitted [EMAIL PROTECTED] ~]$ su Password: pam_cap_bset[6921]: user root does not have 'cap_drop=' property [EMAIL PROTECTED] tak]# cat /proc/self/status | grep ^Cap CapInh: CapPrm: dffe CapEff: dffe [EMAIL PROTECTED] tak]# # BTW, I replaced the James's address in the Cc: list, # because MTA does not accept it. -- KaiGai Kohei <[EMAIL PROTECTED]> pam_cap_drop.c /* * pam_cap_drop.c module -- drop capabilities bounding set * * Copyright: 2007 KaiGai Kohei <[EMAIL PROTECTED]> */ #include #include #include #include #include #include #include #include #include #ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif static char *captable[] = { "cap_chown", "cap_dac_override", "cap_dac_read_search", "cap_fowner", "cap_fsetid", "cap_kill", "cap_setgid", "cap_setuid", "cap_setpcap", "cap_linux_immutable", "cap_net_bind_service", "cap_net_broadcast", "cap_net_admin", "cap_net_raw", "cap_ipc_lock", "cap_ipc_owner", "cap_sys_module", "cap_sys_rawio", "cap_sys_chroot", "cap_sys_ptrace", "cap_sys_pacct", "cap_sys_admin", "cap_sys_boot", "cap_sys_nice", "cap_sys_resource", "cap_sys_time", "cap_sys_tty_config", "cap_mknod", "cap_lease", "cap_audit_write", "cap_audit_control", "cap_setfcap", NULL, }; PAM_EXTERN int pam_sm_open_session(pam_handle_t *pamh, int flags, int argc, const char **argv) { struct passwd *pwd; char *pos, *buf; char *username = NULL; /* open system logger */ openlog("pam_cap_bset", LOG_PERROR | LOG_PID, LOG_AUTHPRIV); /* get the unix username */ if (pam_get_item(pamh, PAM_USER, (void *) &username) != PAM_SUCCESS || !username) return PAM_USER_UNKNOWN; /* get the passwd entry */ pwd = getpwnam(username); if (!pwd) return PAM_USER_UNKNOWN; /* Is there "cap_drop=" ? */ pos = strstr(pwd->pw_gecos, "cap_drop="); if (pos) { buf = strdup(pos + sizeof("cap_drop=") - 1); if (!buf) return PAM_SESSION_ERR; pos = strtok(buf, ","); while (pos) { int rc, i; for (i=0; captable[i]; i++) { if (!strcmp(pos, captable[i])) { rc = prctl(PR_CAPBSET_DROP, i); if (rc < 0) {
Re: [2.6 patch] make jffs2_get_acl() static
Adrian Bunk wrote: jffs2_get_acl() can now become static again. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Acked-by: KaiGai Kohei <[EMAIL PROTECTED]> --- fs/jffs2/acl.c |2 +- fs/jffs2/acl.h |2 -- 2 files changed, 1 insertion(+), 3 deletions(-) add2b887d64536f3fe978e62f0774292456f1ddb diff --git a/fs/jffs2/acl.c b/fs/jffs2/acl.c index 9728614..b14e805 100644 --- a/fs/jffs2/acl.c +++ b/fs/jffs2/acl.c @@ -176,7 +176,7 @@ static void jffs2_iset_acl(struct inode *inode, struct posix_acl **i_acl, struct spin_unlock(&inode->i_lock); } -struct posix_acl *jffs2_get_acl(struct inode *inode, int type) +static struct posix_acl *jffs2_get_acl(struct inode *inode, int type) { struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode); struct posix_acl *acl; diff --git a/fs/jffs2/acl.h b/fs/jffs2/acl.h index 76c6ebd..0bb7f00 100644 --- a/fs/jffs2/acl.h +++ b/fs/jffs2/acl.h @@ -28,7 +28,6 @@ struct jffs2_acl_header { #define JFFS2_ACL_NOT_CACHED ((void *)-1) -extern struct posix_acl *jffs2_get_acl(struct inode *inode, int type); extern int jffs2_permission(struct inode *, int, struct nameidata *); extern int jffs2_acl_chmod(struct inode *); extern int jffs2_init_acl_pre(struct inode *, struct inode *, int *); @@ -40,7 +39,6 @@ extern struct xattr_handler jffs2_acl_default_xattr_handler; #else -#define jffs2_get_acl(inode, type) (NULL) #define jffs2_permission (NULL) #define jffs2_acl_chmod(inode) (0) #define jffs2_init_acl_pre(dir_i,inode,mode) (0) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [TOMOYO 05/15](repost) Domain transition handler functions.
Tetsuo Handa wrote: James Morris wrote: I'm pretty sure that the singly linked list idea has been rejected a few times. Just use the existing API. Too bad... Well, is there a way to avoid read_lock when reading list? Currently, TOMOYO Linux avoids read_lock, on the assumption that (1) First, ptr->next is initialized with NULL. (2) Later, ptr->next is assigned non-NULL address. (3) Assigning to ptr->next is done atomically. Regards. Is it all of the purpose for the list structure? If so, you can apply RCU instead to avoid read lock when scanning the list, like: rcu_read_lock(); list_for_each_entry(...) { } rcu_read_unlock(); -- KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Wrappers to load bitmaps (Re: [PATCH] Improve ebitmap scanning)
Now I'm improving the performance to scan bitmap in SELinux, with replacing its original bitmap implementation (ebitmap) by common bitops like find_next_bit(). I posted a patch to replace them, however, it got a bit complex bacause we had to translate u64 <--> unsigned long by myself to adjust between the format of security policy and common bitops. http://marc.info/?l=selinux&m=118956715414494&w=2 I have an idea to provide several wrapper functions to copy u64/u32 to/from unsigned long for each architecture. Maybe, it will be defined as follows: int arraycpy_u64_to_ulong(u64 *src, unsigned long *dest, size_t len); int arraycpy_ulong_to_u64(unsigned long *src, u64 *dest, size_t len); I believe this feature will help getting code simpler and reducing bugs for any other subsystem, not only SELinux, which loads bitmaps from/to userspace and handle them using common bitops. Any comment please. Stephen Smalley wrote: > On Thu, 2007-09-13 at 10:37 +0900, KaiGai Kohei wrote: >> Paul Moore wrote: >>> On Tuesday, September 11 2007 11:08:44 pm KaiGai Kohei wrote: >>>> The attached patch applies the standard bitmap operations >>>> for the iteration macro of ebitmap, and enables to improve >>>> the performance in AVC-misses case. <...snip...> >> BTW, is there any wrapper to copy an array of u64 to/from architecture >> specific >> unsigned long? If so, it will help implement ebitmap_netlbl_{import|export}() >> and ebitmap_read() more simply. > > Might want to ask on linux-kernel. More generally, it might be a good > idea to cc linux-kernel on your next posting of the patch to get wider > review of how you are using the native linux bitmap support. > > The patch looks very promising, although a detailed review and testing > might take a little bit. -- OSS Platform Development Division, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] file capabilities: introduce cap_setfcap
Serge E. Hallyn wrote: Here's the first patch (of several or many to come) to address some of Andrew's comments. Kaigai, IIUC cap_names.h will eventually be automatically updated? (I had to manually tweak it for testing as the new kernel sources were not located on the test system) The origin of cap_names.h is "/usr/include/linux/capability.h". Some scripts kicked by Makefile convert it, then cap_names.h will be generated. I don't know whether we can expect the kernel headers are always deployed under "/usr/include/linux", or not. In Fedora system, the kernel-headers package deploys all headers there, so cap_names.h will eventually be automatically updated. Thanks, thanks, -serge From fefcd341e478bd9e490d34abe9efd3c3c4f0b8a0 Mon Sep 17 00:00:00 2001 From: Serge E. Hallyn <[EMAIL PROTECTED]> Date: Wed, 27 Jun 2007 13:09:20 -0400 Subject: [PATCH 1/1] file capabilities: introduce cap_setfcap Setting file capabilities previously required the cap_sys_admin capability, since they are stored as extended attributes in the security.* namespace. Introduce CAP_SETFCAP (to mirror CAP_SETPCAP), and require it for setting file capabilities instead of CAP_SYS_ADMIN. Quoting Andrew Morgan, "CAP_SYS_ADMIN is way too overloaded and this functionality is special." Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]> --- include/linux/capability.h |4 +++- security/commoncap.c | 12 ++-- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/capability.h b/include/linux/capability.h index 89125df..cdfaa10 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -324,7 +324,9 @@ typedef __u32 kernel_cap_t; #define CAP_AUDIT_CONTROL30 -#define CAP_NUMCAPS 31 +#define CAP_SETFCAP 31 + +#define CAP_NUMCAPS 32 #ifdef __KERNEL__ /* diff --git a/security/commoncap.c b/security/commoncap.c index 4e9ff02..24de4fa 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -290,7 +290,11 @@ int cap_bprm_secureexec (struct linux_binprm *bprm) int cap_inode_setxattr(struct dentry *dentry, char *name, void *value, size_t size, int flags) { - if (!strncmp(name, XATTR_SECURITY_PREFIX, + if (!strcmp(name, XATTR_NAME_CAPS)) { + if (!capable(CAP_SETFCAP)) + return -EPERM; + return 0; + } else if (!strncmp(name, XATTR_SECURITY_PREFIX, sizeof(XATTR_SECURITY_PREFIX) - 1) && !capable(CAP_SYS_ADMIN)) return -EPERM; @@ -299,7 +303,11 @@ int cap_inode_setxattr(struct dentry *dentry, char *name, void *value, int cap_inode_removexattr(struct dentry *dentry, char *name) { - if (!strncmp(name, XATTR_SECURITY_PREFIX, + if (!strcmp(name, XATTR_NAME_CAPS)) { + if (!capable(CAP_SETFCAP)) + return -EPERM; + return 0; + } else if (!strncmp(name, XATTR_SECURITY_PREFIX, sizeof(XATTR_SECURITY_PREFIX) - 1) && !capable(CAP_SYS_ADMIN)) return -EPERM; -- Open Source Software Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH -mm] file caps: make on-disk capabilities future-proof
itable & CAP_TO_MASK(i)) - return -EPERM; - } - return 0; } @@ -148,50 +145,65 @@ static int set_file_caps(struct linux_bi { struct dentry *dentry; ssize_t rc; - struct vfs_cap_data_disk dcaps; + struct vfs_cap_data_disk *dcaps; struct vfs_cap_data caps; struct inode *inode; - int err; if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID) return 0; dentry = dget(bprm->file->f_dentry); inode = dentry->d_inode; - if (!inode->i_op || !inode->i_op->getxattr) { - dput(dentry); - return 0; + rc = 0; + if (!inode->i_op || !inode->i_op->getxattr) + goto out; + + rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0); + if (rc == -ENODATA) { + rc = 0; + goto out; + } + if (rc < 0) + goto out; + if (rc < sizeof(struct vfs_cap_data_disk)) { + rc = -EINVAL; + goto out; + } + + dcaps = kmalloc(rc, GFP_KERNEL); + if (!dcaps) { + rc = -ENOMEM; + goto out; + } + rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, dcaps, + XATTR_CAPS_SZ); + if (rc == -ENODATA) { + rc = 0; + goto out_free; } - rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, &dcaps, - sizeof(dcaps)); - dput(dentry); - - if (rc == -ENODATA) - return 0; - if (rc < 0) { printk(KERN_NOTICE "%s: Error (%zd) getting xattr\n", __FUNCTION__, rc); - return rc; + goto out_free; } - if (rc != sizeof(dcaps)) { - printk(KERN_NOTICE "%s: got wrong size for getxattr (%zd)\n", - __FUNCTION__, rc); - return -EPERM; - } - - cap_from_disk(&dcaps, &caps); - err = check_cap_sanity(&caps); - if (err) - return err; + rc = cap_from_disk(dcaps, &caps, rc); + if (rc) + goto out_free; + rc = check_cap_sanity(&caps); + if (rc) + goto out_free; bprm->cap_effective = caps.effective; bprm->cap_permitted = caps.permitted; bprm->cap_inheritable = caps.inheritable; - return 0; +out_free: + kfree(dcaps); +out: + dput(dentry); + return rc; } #else static inline int set_file_caps(struct linux_binprm *bprm) -- KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement file posix capabilities
Oops, it's my stupid bug. Ah, this actually makes sense. The setfcaps usage() statement does for (i=0; _cap_names[i]; i++) { printf... so it expects _cap_names to end with a terminating NULL, but that doesn't seem to be the case in cap_names.h in libcap. KaiGai, perhaps setfcaps should do something like diff setfcaps.c.orig setfcaps.c 25c25 < for (i=0; _cap_names[i]; i++) --- for (i=0; i<__CAP_BITS; i++) I fixed the matter as follows: [EMAIL PROTECTED] libcap-1.10.kg]$ env LANG=C svn diff -r2:3 Index: libcap/_makenames.c === --- libcap/_makenames.c (revision 2) +++ libcap/_makenames.c (revision 3) @@ -45,6 +45,7 @@ "#define __CAP_BITS %d\n" "\n" "#ifdef LIBCAP_PLEASE_INCLUDE_ARRAY\n" + " int const _cap_names_num = __CAP_BITS;\n" " char const *_cap_names[__CAP_BITS] = {\n", maxcaps); for (i=0; i #include +extern int const _cap_names_num; extern char const *_cap_names[]; static void usage() { @@ -21,8 +22,8 @@ int i; fputs(message, stderr); -for (i=0; _cap_names[i]; i++) - fprintf(stderr, "%s%s", i%4==0 ? "\n\t" : ", ", _cap_names[i]); +for (i=0; i < _cap_names_num; i++) +fprintf(stderr, "%s%s", i%4==0 ? "\n\t" : ", ", _cap_names[i]); fputc('\n', stderr); exit(0); } [EMAIL PROTECTED] libcap-1.10.kg]$ Because '__CAP_BITS' is decided at compiling time, I think it's not appropriate to indicate the length of _cap_names[] which is linked at run time. Therefore, I add a new integer variable _cap_names_num to represent the length of _cap_names at run time. You can download it from http://www.kaigai.gr.jp/index.php?FrontPage#b556e50d Thanks, -- KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.11-rc4-mm1] connector: Add a fork connector
Hello, Guillaume I tried to measure the process-creation/destruction performance on 2.6.11-rc4-mm1 plus some extensiton(Normal/with PAGG/with Fork-Connector). But I received a following messages endlessly on system console with Fork-Connector extensiton. # on IA-64 environment / When an simple fork() iteration is run in parallel. skb does not have enough length: requested msg->len=10[28], nlh->nlmsg_len=48[32], skb->len=48[must be 30]. skb does not have enough length: requested msg->len=10[28], nlh->nlmsg_len=48[32], skb->len=48[must be 30]. skb does not have enough length: requested msg->len=10[28], nlh->nlmsg_len=48[32], skb->len=48[must be 30]. : Is's generated at drivers/connector/connector.c:__cn_rx_skb(), and this warn the length of msg's payload does not fit in nlmsghdr's length. This message means netlink packet is not sent to user space. I was notified occurence of fork() by printk(). :-( The attached simple *.c file can enable/disable fork-connector and listen the fork-notification. Because It's first experimence for me to write a code to use netlink, point out a right how-to-use if there's some mistakes at user side apprication. Thanks. P.S. I can't reproduce lockup on 367th-fork() with your latest patch. Guillaume Thouvenin wrote: > ChangeLog: > > - Add parenthesis around sizeof(struct cn_msg) + CN_FORK_INFO_SIZE > in the CN_FORK_MSG_SIZE macro > - fork_cn_lock is declareed with DEFINE_SPINLOCK() > - fork_cn_lock is defined as static and local to fork_connector() > - Create a specific module cn_fork.c in drivers/connector to > register the callback. > - Improve the callback that turns on/off the fork connector > > I also run the lmbench and results are send in response to another > thread "A common layer for Accounting packages". When fork connector is > turned off the overhead is negligible. This patch works with another > small patch that fix a problem in the connector. Without it, there is a > message that says "skb does not have enough length". It will be fix in > the next -mm tree I think. > > > Thanks everyone for the comments, > Guillaume -- Linux Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]>#include #include #include #include #include #include #include void usage(){ puts("usage: fclisten "); puts(" Default -> listening fork-connector"); puts(" on -> fork-connector Enable"); puts(" off -> fork-connector Disable"); exit(0); } #define MODE_LISTEN (1) #define MODE_ENABLE (2) #define MODE_DISABLE (3) struct cb_id { __u32 idx; __u32 val; }; struct cn_msg { struct cb_idid; __u32 seq; __u32 ack; __u32 len;/* Length of the following data */ __u8data[0]; }; int main(int argc, char *argv[]){ char buf[4096]; int mode, sockfd, len; struct sockaddr_nl ad; struct nlmsghdr *hdr = (struct nlmsghdr *)buf; struct cn_msg *msg = (struct cn_msg *)(buf+sizeof(struct nlmsghdr)); switch(argc){ case 1: mode = MODE_LISTEN; break; case 2: if (strcasecmp("on",argv[1])==0) { mode = MODE_ENABLE; }else if (strcasecmp("off",argv[1])==0){ mode = MODE_DISABLE; }else{ usage(); } break; default: usage(); break; } if( (sockfd=socket(PF_NETLINK, SOCK_RAW, NETLINK_NFLOG)) < 0 ){ fprintf(stderr, "Fault on socket().\n"); return( 1 ); } ad.nl_family = AF_NETLINK; ad.nl_pad = 0; ad.nl_pid = getpid(); ad.nl_groups = -1; if( bind(sockfd, (struct sockaddr *)&ad, sizeof(ad)) ){ fprintf(stderr, "Fault on bind to netlink.\n"); return( 2 ); } if (mode==MODE_LISTEN) { while(-1){ len = recvfrom(sockfd, buf, 4096, 0, NULL, NULL); printf("%d-byte recv Seq=%d\n", len, hdr->nlmsg_seq); } }else{ ad.nl_family = AF_NETLINK; ad.nl_pad = 0; ad.nl_pid = 0; ad.nl_groups = 1; hdr->nlmsg_len = sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + sizeof(int); hdr->nlmsg_type = 0; hdr->nlmsg_flags = 0; hdr->nlmsg_seq = 0; hdr->nlmsg_pid = getpid(); msg->id.idx = 0xfeed; msg->id.val = 0xbeef; msg->seq = msg->ack = 0; msg->len = sizeof(int); if (mode==MODE_ENABLE){ (*(int *)(msg->data)) = 1; } else { (*(int *)(msg->data)) = 0; } sendto(sockfd, buf, sizeof(struct nlmsghdr)+sizeof(struct cn_msg)+sizeof(int), 0, (struct sockaddr *)&ad, sizeof(ad)); } }
Re: [Lse-tech] Re: A common layer for Accounting packages
Hello, > I tested without user space listeners and the cost is negligible. I will > test with a user space listeners and see the results. I'm going to run > the test this week after improving the mechanism that switch on/off the > sending of the message. I'm also trying to mesure the process-creation/destruction performance on following three environment. Archtechture: i686 / Distribution: Fedora Core 3 * Kernel Preemption is DISABLE * SMP kernel but UP-machine / Not Hyper Threading [1] 2.6.11-rc4-mm1 normal [2] 2.6.11-rc4-mm1 with PAGG based Process Accounting Module [3] 2.6.11-rc4-mm1 with fork-connector notification (it's enabled) When 367th-fork() was called after fork-connector notification, kernel was locked up. (User-Space-Listener has been also run until 366th-fork() notification was received) Does this number have any sort of means ? In my second trial, kernel was also locked up after 366th-fork() notification. Currently, I don't know its causition. Is there a person encounted it? # I wanted to say "[2] is faster than [3]" when process-grouping is enable, but the plan came off. :( Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Hello, Marcelo Tosatti wrote: > Yep, the netlink people should be able to help - they known what would be > required for not sending messages in case there is no listener registered. > > Maybe its already possible? I have never used netlink myself. If we notify the fork/exec/exit-events to user-space directly as you said, I don't think some hackings on netlink is necessary. For example, such packets is sent only when /proc/sys/.../process_grouping is set, and user-side daemon set this value, and unset when daemon will exit. It's not necessary to take too seriously. >>And, why can't netlink packets send always? >>If there are fork/exec/exit hooks, and they call CSA or other >>process-grouping modules, >>then those modules will decide whether packets for interaction with the >>daemon should be >>sent or not. > > > The netlink data will be sent to userspace at fork/exec/exit hooks - one wants > to avoid that if there are no listeners, so setups which dont want to run the > accounting daemon dont pay the cost of building and sending the information > through netlink. > > Thats what Andrew asked for if I understand correctly. Does it mean "netlink packets shouled be sent to userspace unconditionally." ? I have advocated steadfastly that fork/exec/exit hooks is necessary to support process-grouping and to account per process-grouping. It intend to be decided whether packets should be sent or not by hooked functions, in my understanding. Is it also one of the implementations whether using netlink-socket or not ? >>In most considerable case, CSA's kernel-loadable-module using such hooks >>will not be loaded >>when no accounting daemon is running. Adversely, this module must be loaded >>when accounting >>daemon needs CSA's netlink packets. >>Thus, it is only necessary to refer flag valiable and to execute >>conditional-jump >>when no-accounting daemon is running. > > > That would be one hack, although it is uglier than the pure netlink > selection. No, I can't agree this opinion. It means netlink-packets will be sent unconditionally when fork/exec/exit occur. Nobady can decide which packet is sent user-space, I think. In addition, the definition of process grouping is lightweight in many cases. For example, CpuSet can define own process-group by one increment-operation. I think it's not impossible to implement it in userspace, but it's not reasonable. An implementation as a kernel loadable module is reasonable and enough tiny. >>In my estimation, we must pay additional cost for an increment-operation, >>an decrement-op, >>an comparison-op and an conditional jump-op. It's enough lightweight, I >>think. >> >>For example: >>If CSA's module isn't loaded, 'privates_for_grouping' will be empty. >> >>inline int on_fork_hook(task_struct *parent, task_struct *newtask){ >> rcu_read_lock(); >> if( !list_empty(&parent->privates_for_grouping) ){ >>; >> } >> rcu_read_unlock(); >>} > > > Andrew has been talking about sending data over netlink to implement the > accounting at userspace, so this piece of code is out of the game, no? Indeed, I'm not opposed to implement the accounting in userspace and using netlink-socket for kernel-daemon communication. But definition of process-grouping based on any grouping policy should be done in kernel space at reasonability viewpoint. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Hi, Kaigai Kohei <[EMAIL PROTECTED]> wrote: In my understanding, what Andrew Morton said is "If target functionality can implement in user space only, then we should not modify the kernel-tree". fork, exec and exit upcalls sound pretty good to me. As long as a) they use the same common machinery and b) they are next-to-zero cost if something is listening on the netlink socket but no accounting daemon is running. b) would involved being able to avoid sending netlink messages in case there are no listeners. AFAIK that isnt possible currently, netlink sends packets unconditionally. Am I wrong? In current implementaion, you might be right. But we should make an effort to achieve the requirement-(b) from now. And, why can't netlink packets send always? If there are fork/exec/exit hooks, and they call CSA or other process-grouping modules, then those modules will decide whether packets for interaction with the daemon should be sent or not. In most considerable case, CSA's kernel-loadable-module using such hooks will not be loaded when no accounting daemon is running. Adversely, this module must be loaded when accounting daemon needs CSA's netlink packets. Thus, it is only necessary to refer flag valiable and to execute conditional-jump when no-accounting daemon is running. In my estimation, we must pay additional cost for an increment-operation, an decrement-op, an comparison-op and an conditional jump-op. It's enough lightweight, I think. For example: If CSA's module isn't loaded, 'privates_for_grouping' will be empty. inline int on_fork_hook(task_struct *parent, task_struct *newtask){ rcu_read_lock(); if( !list_empty(&parent->privates_for_grouping) ){ ; } rcu_read_unlock(); } Thanks, -- Linux Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Sorry for this late reply. >> [1] Is it necessary 'fork/exec/exit' event handling framework ? .. >> Some process-aggregation model have own philosophy and implemantation, >> so it's hard to integrate. Thus, I think that common 'fork/exec/exit' >> event handling >> framework to implement any kinds of process-aggregation. > > > BSD needs an exit hook and ELSA needs a fork hook. I am still > evaluating whether CSA can use the ELSA module. If CSA can use the > ELSA module, CSA maybe would be fine with the fork hook. If CSA can use an ELSA module, then we must modify the kernel-tree for ELSA's fork-connecter. This means it's hard to implement the fork/exec/exit event notification to userspace (,or any kernel module) without kernel-support. How CSA shoule be implemented is interesting and important, but should it be main subject in this discussion that such a kinds of kernel hook is necessary to implement process-accounting per process-aggregation reasonable ? In my understanding, what Andrew Morton said is "If target functionality can implement in user space only, then we should not modify the kernel-tree". But, any kind of kernel support was required to handle process lifecycle events for the accounting per process-aggregation and so on, from our discussion. I'm also opposed to an adhoc approach, like CSA depending on ELSA. We should walk hight road. Thanks, -- Linux Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Hi, Thanks for your comments. Andrew Morton wrote: >> Some process-aggregation model have own philosophy and implemantation, >> so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event handling >> framework to implement any kinds of process-aggregation. > > > We really want to avoid doing such stuff in-kernel if at all possible, of > course. > > Is it not possible to implement the fork/exec/exit notifications to > userspace so that a daemon can track the process relationships and perform > aggregation based upon individual tasks' accounting? That's what one of > the accounting systems is proposing doing, I believe. > > (In fact, why do we even need the notifications? /bin/ps can work this > stuff out). It's hard to prove that we can't implement the process-aggregation only in user-space, but there are some difficulties on imaplementation, I think. For example, each process must have a tag or another identifier to explain what process-aggregation does it belong, but kernel does not support thoes kind of information, currently. Thus, we can't guarantee associating one process-aggregation with one process. # /proc//loginuid might be candidate, but it's out of original purpose. We might be able to make alike system, but is it hard to implement strict process-aggregation without any kernel supports? I think that well thought out kernel-modification is better than ad-hoc implementation on user-space. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Hi, Thanks for your comments. >> I think there are two issues about system accounting framework. >> >> Issue: 1) How to define the appropriate unit for accounting ? >> Current BSD-accountiong make a collection per process accounting >> information. >> CSA make additionally a collection per process-aggregation accounting. > > > The 'enhanced acct data collection' patches that were added to > 2-6-11-rc* tree still do collection of per process data. Hmm, I have not noticed this extension. But I made sure about it. The following your two patches implements enhanced data collection, didn't it? - ChangeLog for 2.6.11-rc1 [PATCH] enhanced I/O accounting data patch [PATCH] enhanced Memory accounting data collection Since making a collection per process accounting is unified to the stock kernel, I want to have a discussion about remaining half, "How to define the appropriate unit for accounting ?" We can agree that only per process-accounting is so rigid, I think. Then, process-aggregation should be provided in one way or another. [1] Is it necessary 'fork/exec/exit' event handling framework ? The common agreement for the method of dealing with process aggregation has not been constructed yet, I understood. And, we will not able to integrate each process aggregation model because of its diverseness. For example, a process which belong to JOB-A must not belong any other 'JOB-X' in CSA-model. But, In ELSA-model, a process in BANK-B can concurrently belong to BANK-B1 which is a child of BANK-B. And, there are other defferences: Whether a process not to belong to any process-aggregation is permitted or not ? Whether a process-aggregation should be inherited to child process or not ? (There is possibility not to be inherited in a rule-based process aggregation like CKRM) Some process-aggregation model have own philosophy and implemantation, so it's hard to integrate. Thus, I think that common 'fork/exec/exit' event handling framework to implement any kinds of process-aggregation. [2] What implementation should be adopted ? I think registerable hooks on fork/execve/exit is necessary, not only exit() hook. Because a rule or policy based process-aggregation model requirees to catch the transition of a process status. It might be enough to hook the exit() event only in process-accounting, but it's not kind for another customer. Thus, I recommend SGI's PAGG. In my understanding, the reason for not to include such a framework is that increase of unidentifiable (proprietary) modules is worried. But, SI can divert LSM to implemente process-aggregation if they ignore the LSM's original purpose, for example. # I'm strongly opposed to such a movement as a SELinux's user :-) So, I think such a fork/execve/exit hooks is harmless now. Is this the time to unify it? Thanks. > CSA added those per-process data to per-aggregation ("job") data > structure at do_exit() time when a process termintes. > >> >> It is appropriate to make the fork-exit event handling framework for >> definition >> of the process-aggregation, such as PAGG. >> >> This system-accounting per process-aggregation is quite useful, >> thought I tried the SGI's implementation named 'job' in past days. >> >> >> Issue: 2) What items should be collected for accounting information ? >> BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of >> minor/major page faults. SGI's CSA collects VM/RSS size on exit time, >> Integral-VM/RSS, and amount of block-I/O additionally. > > > These data are now collected in 2.6.11-rc* code. Note that these data > are still per-process. > >> >> I think it's hard to implement the accounting-engine as a kernel loadable >> module using any kinds of framework. Because, we must put callback >> functions >> into all around the kernel for this purpose. >> >> Thus, I make a proposion as follows: >> We should separate the process-aggregation functionality and collecting >> accounting informations. > > > I totally agree with this! Actually that was what we have done. The data > collection part of code has been unified. > >> Something of framework to implement process-aggregation is necessary. >> And, making a collection of accounting information should be merged >> into BSD-accounting and implemented as a part of monolithic kernel >> as Guillaume said. > > > This sounds good. I am interested in learning how ELSA saves off > the per-process accounting data before the data got disposed. If > that scheme works for CSA, we would be very happy to adopt the > scheme. The current BSD scheme is very insufficient. The code is > very BSD centric and it provides no way to handle process-aggregation. > > Thanks, > - jay > >> >> Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Hello, everyone. Andrew Morton wrote: > Jay Lan <[EMAIL PROTECTED]> wrote: > >>Since the need of Linux system accounting has gone beyond what BSD >>accounting provides, i think it is a good idea to create a thin layer >>of common code for various accounting packages, such as BSD accounting, >>CSA, ELSA, etc. The hook to do_exit() at exit.c was changed to invoke >>a routine in the common code which would then invoke those accounting >>packages that register to the acct_common to handle do_exit situation. > > > This all seems to be heading in the wrong direction. Do we really want to > have lots of different system accounting packages all hooking into a > generic we-cant-decide-what-to-do-so-we-added-some-pointless-overhead > framework? > > Can't we get _one_ accounting system in there, get it right, avoid the > framework? I think there are two issues about system accounting framework. Issue: 1) How to define the appropriate unit for accounting ? Current BSD-accountiong make a collection per process accounting information. CSA make additionally a collection per process-aggregation accounting. It is appropriate to make the fork-exit event handling framework for definition of the process-aggregation, such as PAGG. This system-accounting per process-aggregation is quite useful, thought I tried the SGI's implementation named 'job' in past days. Issue: 2) What items should be collected for accounting information ? BSD-accounting collects PID/UID/GID, User/Sys/Elapsed-Time, and # of minor/major page faults. SGI's CSA collects VM/RSS size on exit time, Integral-VM/RSS, and amount of block-I/O additionally. I think it's hard to implement the accounting-engine as a kernel loadable module using any kinds of framework. Because, we must put callback functions into all around the kernel for this purpose. Thus, I make a proposion as follows: We should separate the process-aggregation functionality and collecting accounting informations. Something of framework to implement process-aggregation is necessary. And, making a collection of accounting information should be merged into BSD-accounting and implemented as a part of monolithic kernel as Guillaume said. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/