[NET/IPv6] Race condition with flow_cache_genid?
Hi, I was poking around trying to figure out how to install the Mobile IPv6 daemons this evening and noticed they required a kernel patch, although upon further inspection the kernel patch seemed to already be applied in 2.6.24. Unfortunately the flow cache appears to be horribly racy. Attached below are the only uses of the variable "flow_cache_genid" in 2.6.24. Now, I am no expert in this particular area of the code, but the "atomic_t flow_cache_genid" variable is ONLY ever used with atomic_inc() and atomic_read(). There are no memory barriers or other dec_and_test()-style functions, so that variable could just as easily be replaced with a plain old C int. Basically either there is some missing locking here or it does not need to be "atomic_t". Judging from the way it *appears* to be used to check if cache entries are up-to-date with the latest changes in policy, I would guess the former. In particular that whole "flow_cache_lookup()" thing looks racy as hell, since somebody could be in the middle of that looking at "if (fle->genid == atomic_read(&flow_cache_genid))". It does the atomic_read(), which BTW is literally implemented as: #define atomic_read(atomicvar) ((atomicvar)->value) on some platforms. Immediately after the atomic read (or even before, since there's no cache-flush or read-modify-write), somebody calls into "selinux_xfrm_notify_policyload()" and increments the flow_cache_genid becase selinux just loaded a security policy. Now we're accepting a cache entry which applies to PREVIOUS security policy. I can only assume that's really bad. Even worse, there seems to be a race between SELinux loading a new policy and calling selinux_xfrm_notify_policyload(), since we could easily get packets and process them according to the old cache entry on one CPU before SELinux has had a chance to update the generation ID from the other. Furthermore, there's no guarantee the CPU caches will get updated in reasonable time. Clearly SELinux needs to have some way of atomically invalidating the flow cache of all CPUs *simultaneously* with loading a new policy, which probably means they both need to be under the same lock, or something. The same problem appears to occur with updating the XFRM policy and incrementing flow_cache_genid. Probably the fastest solution is to put the flow cache under the xfrm_policy_lock (which already disables local bottom-halves), and either take that lock during SELinux policy load or if there are lock ordering problems then add a variable "flow_cache_ignore" and change the xfrm_notify hooks: void selinux_xfrm_notify_policyload_pre(void) { write_lock_bh(&xfrm_policy_lock); flow_cache_genid++; flow_cache_ignore = 1; write_unlock_bh(&xfrm_policy_lock); } void selinux_xfrm_notify_policyload_post(void) { write_lock_bh(&xfrm_policy_lock); flow_cache_ignore = 0; write_unlock_bh(&xfrm_policy_lock); } Cheers, Kyle Moffett BEGIN QUOTED CODE INVOLVING flow_cache_genid: include/net/flow.h:94: extern atomic_t flow_cache_genid; net/core/flow.c:39: atomic_t flow_cache_genid = ATOMIC_INIT(0); net/core/flow.c:169:flow_cache_lookup(): if (flow_hash_rnd_recalc(cpu)) flow_new_hash_rnd(cpu); hash = flow_hash_code(key, cpu); head = &flow_table(cpu)[hash]; for (fle = *head; fle; fle = fle->next) { if (fle->family == family && fle->dir == dir && flow_key_compare(key, &fle->key) == 0) { if (fle->genid == atomic_read(&flow_cache_genid)) { void *ret = fle->object; if (ret) atomic_inc(fle->object_ref); local_bh_enable(); return ret; } break; } } net/xfrm/xfrm_policy.c:1025: int xfrm_policy_delete(struct xfrm_policy *pol, int dir) { write_lock_bh(&xfrm_policy_lock); pol = __xfrm_policy_unlink(pol, dir); write_unlock_bh(&xfrm_policy_lock); if (pol) { if (dir < XFRM_POLICY_MAX) atomic_inc(&flow_cache_genid); xfrm_policy_kill(pol); return 0; } return -ENOENT; } net/ipv6/inet6_connection_sock.c:142: static inline void __inet6_csk_dst_store(struct sock *sk, struct dst_entry *dst, struct in6_addr *daddr, struct in6_addr *saddr) { __ip6_dst_store(sk, dst, daddr, saddr); #ifdef CONFIG_XFRM { struct rt6_info *rt = (struct rt6_info *)dst; rt->rt6i_flow_cache_genid = atomic_read(&flow
Re: [PATCH] Allow NBD to be used locally
Whoops, only hit "Reply" on the first email, sorry Jan. On Feb 2, 2008 7:54 PM, Jan Engelhardt <[EMAIL PROTECTED]> wrote: > On Feb 2 2008 18:31, [EMAIL PROTECTED] wrote: > > > >> How will that work? Fuse makes up a filesystem - not helpful > >> if you have a raw disk without a known fs to mount. > > > >take zfs-fuse or ntfs-3g for example. > >you have a blockdevice or backing-file containing data structures and fuse > >makes those show up as a filesystem. > >i think vmware-mount is not different here. > > vmware-mount IS different, it provides the _block_ device, > which is then mounted through the usual mount(2) mechanism > (if there is a filesystem driver for it). As far as I can tell, vmware-mount should be re-implemented as a little perl script around "dmsetup" and/or "losetup", possibly with "dm-userspace" patched into the kernel to allow you to handle non-mapped blocks in your userspace daemon when somebody tries to access them. If you don't need that ability then straight dm-loop and dm-linear will work. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] Best method to control a "transmit-only" mode on fiber NICs (specifically sky2)
Hi, The company I'm working for has an unusual fiber NIC configuration that we use for one of our network appliances. We connect only a single fiber from the TX port on one NIC to the RX port on another NIC, providing a physically-one-way path for enhanced security. Unfortunately this doesn't work with most NIC drivers, as even with auto-negotiation off they look for link probe pulses before they consider the link "up" and are willing to send packets. We have been able to use Myricom 10GigE NICs with a custom firmware image. More recently we have patched the sky2 driver to turn on the FIB_FORCE_LNK flag in the PHY control register; this seems to work on the Marvell-chipset boards we have here. What would be the preferred way to control this "force link" flag? Right now we are accessing it using ethtool; we have added an additional "duplex" mode: "DUPLEX_TXONLY", with a value of 2. When you specify a speed and turn off autonegotiation ("./patched-ethtool -s eth2 speed 1000 autoneg off duplex txonly"), it will turn on the specified bit in the PHY control register and the link will automatically come up. We also have one related bug-fix^Wdirty hack for sky2 to reset the PHY a second time during netif-up after enabling interrupts; otherwise the immediate "link up" interrupt gets lost. Once I get approval from the company I will patch the post itself for review. I look forward to your comments and suggestions Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] target: Update copyright ownership to 2012
On Fri, Nov 9, 2012 at 3:00 PM, Nicholas A. Bellinger wrote: > This patch to update copyright year to current for principal target core > ownership is now being pushed into target-pending/for-next. Pardon me, but you were just publicly accused of violating the GPL, so your response is to send a patch removing the copyright notices of all other organizations from the SCSI-target code? Have you obtained ownership of all the relevant copyrights for Linux-iSCSI.org, PyX Technologies, Inc, and SBE, Inc? If not, then this patch is an attempted violation of those organizations copyrights and of the GPL (which requires that you preserve copyright notices). Further, while these notices are the only ones listed in those files, they are not the only individuals outside of RisingTide Systems which have significant copyright interest in this code. If your goal is to obtain exclusive copyright ownership over this code then there are a great many other people you must contact and convince first. I would encourage you to talk privately with the Software Freedom Conservancy before sending more patches of this nature. Cheers, Kyle Moffett > diff --git a/drivers/target/target_core_alua.c > b/drivers/target/target_core_alua.c > - * Copyright (c) 2009-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_configfs.c > b/drivers/target/target_core_configfs.c > - * Copyright (c) 2008-2011 Linux-iSCSI.org > diff --git a/drivers/target/target_core_device.c > b/drivers/target/target_core_device.c > - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005-2006 SBE, Inc. All Rights Reserved. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_fabric_configfs.c > b/drivers/target/target_core_fabric_configfs.c > - * Copyright (c) 2010,2011 Linux-iSCSI.org > diff --git a/drivers/target/target_core_fabric_lib.c > b/drivers/target/target_core_fabric_lib.c > - * Copyright (c) 2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_file.c > b/drivers/target/target_core_file.c > - * Copyright (c) 2005 PyX Technologies, Inc. > - * Copyright (c) 2005-2006 SBE, Inc. All Rights Reserved. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_hba.c > b/drivers/target/target_core_hba.c > - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005, 2006, 2007 SBE, Inc. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_iblock.c > b/drivers/target/target_core_iblock.c > - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005, 2006, 2007 SBE, Inc. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c > - * Copyright (c) 2009, 2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_pscsi.c > b/drivers/target/target_core_pscsi.c > - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005, 2006, 2007 SBE, Inc. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c > - * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005, 2006, 2007 SBE, Inc. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_sbc.c > b/drivers/target/target_core_sbc.c > - * Copyright (c) 2002, 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005, 2006, 2007 SBE, Inc. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_spc.c > b/drivers/target/target_core_spc.c > - * Copyright (c) 2002, 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005, 2006, 2007 SBE, Inc. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_stat.c > b/drivers/target/target_core_stat.c > - * Copyright (c) 2011 Linux-iSCSI.org > - * Copyright (c) 2006-2007 SBE, Inc. All Rights Reserved. > diff --git a/drivers/target/target_core_tmr.c > b/drivers/target/target_core_tmr.c > - * Copyright (c) 2009,2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_tpg.c > b/drivers/target/target_core_tpg.c > - * Copyright (c) 2002, 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005, 2006, 2007 SBE, Inc. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_transport.c > b/drivers/target/target_core_transport.c > - * Copyright (c) 2002, 2003, 2004, 2005 PyX Technologies, Inc. > - * Copyright (c) 2005, 2006, 2007 SBE, Inc. > - * Copyright (c) 2008-2010 Linux-iSCSI.org > diff --git a/drivers/target/target_core_ua.c b/drivers/target/target_core_ua.c > - * Copyright (c) 2009,2010 Linux-iSCSI.org -- To unsubscrib
[NET/IPv6] Race condition with flow_cache_genid?
Whoops, I accidentally sent this to [EMAIL PROTECTED] instead of [EMAIL PROTECTED] Original email below: Hi, I was poking around trying to figure out how to install the Mobile IPv6 daemons this evening and noticed they required a kernel patch, although upon further inspection the kernel patch seemed to already be applied in 2.6.24. Unfortunately the flow cache appears to be horribly racy. Attached below are the only uses of the variable "flow_cache_genid" in 2.6.24. Now, I am no expert in this particular area of the code, but the "atomic_t flow_cache_genid" variable is ONLY ever used with atomic_inc() and atomic_read(). There are no memory barriers or other dec_and_test()-style functions, so that variable could just as easily be replaced with a plain old C int. Basically either there is some missing locking here or it does not need to be "atomic_t". Judging from the way it *appears* to be used to check if cache entries are up-to-date with the latest changes in policy, I would guess the former. In particular that whole "flow_cache_lookup()" thing looks racy as hell, since somebody could be in the middle of that looking at "if (fle->genid == atomic_read(&flow_cache_genid))". It does the atomic_read(), which BTW is literally implemented as: #define atomic_read(atomicvar) ((atomicvar)->value) on some platforms. Immediately after the atomic read (or even before, since there's no cache-flush or read-modify-write), somebody calls into "selinux_xfrm_notify_policyload()" and increments the flow_cache_genid becase selinux just loaded a security policy. Now we're accepting a cache entry which applies to PREVIOUS security policy. I can only assume that's really bad. Even worse, there seems to be a race between SELinux loading a new policy and calling selinux_xfrm_notify_policyload(), since we could easily get packets and process them according to the old cache entry on one CPU before SELinux has had a chance to update the generation ID from the other. Furthermore, there's no guarantee the CPU caches will get updated in reasonable time. Clearly SELinux needs to have some way of atomically invalidating the flow cache of all CPUs *simultaneously* with loading a new policy, which probably means they both need to be under the same lock, or something. The same problem appears to occur with updating the XFRM policy and incrementing flow_cache_genid. Probably the fastest solution is to put the flow cache under the xfrm_policy_lock (which already disables local bottom-halves), and either take that lock during SELinux policy load or if there are lock ordering problems then add a variable "flow_cache_ignore" and change the xfrm_notify hooks: void selinux_xfrm_notify_policyload_pre(void) { write_lock_bh(&xfrm_policy_lock); flow_cache_genid++; flow_cache_ignore = 1; write_unlock_bh(&xfrm_policy_lock); } void selinux_xfrm_notify_policyload_post(void) { write_lock_bh(&xfrm_policy_lock); flow_cache_ignore = 0; write_unlock_bh(&xfrm_policy_lock); } Cheers, Kyle Moffett BEGIN QUOTED CODE INVOLVING flow_cache_genid: include/net/flow.h:94: extern atomic_t flow_cache_genid; net/core/flow.c:39: atomic_t flow_cache_genid = ATOMIC_INIT(0); net/core/flow.c:169:flow_cache_lookup(): if (flow_hash_rnd_recalc(cpu)) flow_new_hash_rnd(cpu); hash = flow_hash_code(key, cpu); head = &flow_table(cpu)[hash]; for (fle = *head; fle; fle = fle->next) { if (fle->family == family && fle->dir == dir && flow_key_compare(key, &fle->key) == 0) { if (fle->genid == atomic_read(&flow_cache_genid)) { void *ret = fle->object; if (ret) atomic_inc(fle->object_ref); local_bh_enable(); return ret; } break; } } net/xfrm/xfrm_policy.c:1025: int xfrm_policy_delete(struct xfrm_policy *pol, int dir) { write_lock_bh(&xfrm_policy_lock); pol = __xfrm_policy_unlink(pol, dir); write_unlock_bh(&xfrm_policy_lock); if (pol) { if (dir < XFRM_POLICY_MAX) atomic_inc(&flow_cache_genid); xfrm_policy_kill(pol); return 0; } return -ENOENT; } net/ipv6/inet6_connection_sock.c:142: static inline void __inet6_csk_dst_store(struct sock *sk, struct dst_entry *dst, struct in6_addr *daddr, struct in6_addr *saddr) { __ip6_dst_store(sk, dst, daddr, saddr); #ifdef CONFIG_XFRM { struc
Re: Use of C99 int types
On Apr 04, 2005, at 17:25, Richard B. Johnson wrote: I don't find stdint.h in the kernel source (up to 2.6.11). Is this going to be a new addition? Uhh, no. stdint.h is part of glibc, not the kernel. It would be very helpful to start using the uint(8,16,32,64)_t types because they are self-evident, a lot more than size_t or, my favorite wchar_t. You miss the point of size_t and ssize_t/ptrdiff_t. They are types guaranteed to be at least as big as the pointer size. uint8/16/32/64, on the other hand, are specific bit-sizes, which may not be as fast or correct as a simple size_t. Linus has pointed out that while it doesn't matter which of __u32, u32, uint32_t, etc you use for kernel private interfaces, you *cannot* use anything other than __u32 in the parts of headers that userspace will see, because __u32 is defined only by the kernel and so there is no risk for conflicts, as opposed to uint32_t, which is also defined by libc, resulting in collisions in naming. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of C99 int types
On Apr 05, 2005, at 05:23, Renate Meijer wrote: uint8/16/32/64, on the other hand, are specific bit-sizes, which may not be as fast or correct as a simple size_t. Using specific widths may yield benefits on one platform, whilst proving a real bottleneck when porting something to another. A potential of problems easily avoided by using plain-vanilla integers. The point of specific-width integers is to preserve a specific binary format, such as a filesystem on-disk data structure, or a kernel-userspace ABI, etc. If you just need a number, use a different type. Strictly speaking, a definition starting with a double underscore is reserved for use by the compiler and associated libs Well, _strictly_speaking_, it's "implementation defined", where the "implementation" includes the kernel (due to the syscall interface). this such a declaration would invade implementation namespace. The compilers implementation, that is. But the C library is implicitly dependent on the kernel headers for a wide variety of datatypes. In this case, the boundary is a bit vague, i see that, since a lot of header definitions also reside in the /usr/include hierarchy. Some of which are produced by kernel sources: /usr/include/linux, /usr/include/asm, etc. I think it would be usefull to at least *agree* on a standard type for 8/16/32/64-bit integer types. What I see now as a result of grepping for 'uint32' is a lot more confusing than stdint.h Well, Linus has supported that there is no standard, except where ABI is concerned, there we must use __u32 so that it does not clash with libc or user programs. Especially the types with leading underscores look cool, but in reality may cause a conflict with compiler internals and should only be used when defining compiler libraries. It's "implementation" (kernel+libc+gcc) defined. It just means that gcc, the kernel, and libc have to be much more careful not to tread on each others toes. The '__' have explicitly been put in by ISO in order to avoid conflicts between user-code and the standard libraries, The "standard libraries" includes the syscall interface here. If the kernel types could not be prefixed with __, then what _should_ we prefix them with? Furthermore, I think it's wise to convince the community that if not needed, integers should not be specified by any specific width. That doesn't work for an ABI. If you switch compilers (or from 32-bit to 64-bit like from x86 to x86-64, you _must_ be able to specify certain widths for all the ABI numbers to preserve compatibility. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of C99 int types
On Apr 05, 2005, at 08:18, Richard B. Johnson wrote: One cannot just use 'int' or 'long', in particular when interfacing with an operating system. For example, look at the socket interface code. Parameters are put into an array of longs and a pointer to this array is passed to the socket interface. It's a mess when converting this code to 64-bit world. Exactly If originally one used a structure of the correct POSIX integer types, and a pointer to the structure was passed, then absolutely nothing in the source-code would have to be changed at all when compiling that interface for a 64-bit machine. But you _can't_ use the POSIX integer types. When compiling the kernel, if you use the types, you must define them in the kernel headers. On the other hand, when compiling userspace stuff, you _can't_ have them defined in the kernel headers because libc also defines them. The solution is to use __{s,u}{8,16,32,64}, which are _only_ defined by the kernel, not by libc or gcc, and can be therefore used in the ABI. The continual short-cuts, with the continual "special-case" hacks is what makes porting difficult. That's what the POSIX types was supposed to help prevent. Except the POSIX types themselves are not usable for the boundary code for the reasons of double definition. Google for Linus' posts on this topic a couple months ago. That's why I think if there was a stdint.h file in the kernel, when people were performing maintenance or porting their code, they could start using those types. The types _are_ available from the kernel headers, but only when compiling with __KERNEL__, to avoid conflicts from the libc definitions. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of C99 int types
h with my user-defined types. Anything you like. 'kernel_' or simply 'k_' would be appropriate. As long as you do not invade compiler namespace. It is separated and uglyfied for a purpose. But the _entire_ non _ namespace is reserved for anything user programs want to do with it. I think most of the kernel types in the current headers use __kernel_, which is safe enough. Does not work when you are touching externally defined interfaces in general, including that of a CPU. There are places for uint32_t and friends and even for __uint32_t and it's kin, but abusing them will cause trouble in a world that is accommodating more than one register-size. This is all I am saying. But in a world with more than one register size, you _must_ use them, for example, the x86-64 code uses them to handle 32-bit backwards compatibility, and the ppc64 code does likewise. When a program compiled as ppc32 gets run on my ppc64 box, the kernel understands that anything pushed onto the stack as arguments is 32-bit, and must use specifically sized types to handle that properly. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of C99 int types
On Apr 06, 2005, at 07:41, Renate Meijer wrote: On Apr 6, 2005, at 12:11 AM, Kyle Moffett wrote: Please don't remove Linux-Kernel from the CC, I think this is an important discussion. GAAH!!! Read my lips!!! Quit removing Linux-Kernel from the CC!!! As I see it, there are a number of issues - Use of double underscores invades compiler namespace (except in those cases where kernel definitions end up as the basis for definitions in /usr/include/*, i.e. those that actually are part of the C-implementation for Linux. It is these that I'm talking about. This is exactly my point (The cases where the kernel definitions are part of /usr/include). - Some type that does not conflict with compiler namespace to replace the variety of definitions for e.g. 32-bit unsigned integers we have now. As I said, I don't care about this, so do whatever you want. - Removal of anything prefixed with a double underscore from non-C-implementation files. ATM, much of the stuff in include/linux and include/asm-* is considered "C-implementation" because it is used from userspace. If you want to clean that up and start moving abi files to include/kernel-abi or somesuch, feel free, but that's a lot of work Personally, I don't care what you feel like requiring for purely in-kernel interfaces, but __{s,u}{8,16,32,64} must stay to avoid namespace collisions with glibc in the kernel include files as used by userspace. Aye, but as I have pointed out several times, these types should be restricted to those files and *only* those files which eventually end up in the compilers includes. In every other place, they invite exactly the trouble they are intended to avoid. Precisely. So if you want to make the millions of patches, go right ahead, be my guest. :-P Until somebody steps forward to clean up the huge mess, nothing will get done. So in every place exept those files which may actually cause a namespace conflict or a bug because some newer version does not support __foobar, or changed the semantics. Since using any __foobar type implies relying on the compiler internals, which may change without prior notice, it is ipso facto undesirable. Except the kernel wants to be optimized and work and use what features are available. The kernel uses __foobar stuff provided by the compiler because it has gccX.h files specifically designed to take compiler interfaces, provide backups when they don't exist, and use them (and their better checking) when they do. This is kinda arguing semantics, but: A particular set of software (linux+libc+gcc), running in a particular translation environment (userspace) under particular control options (Signals, nice values, etc), that performs translation of programs for (emulating missing instructions), and supports execution of functions (syscalls) in, a particular execution environment (also userspace). Ok. And where exactly are linux and libc when compiling code for an Atmel ATmega32 (40 pin DIL) using gcc? Where do you get Atmel ATmega32 from? I _only_ care about what symbols Linux can use, and as I've mentioned, when running under *Linux*, then it just so happens that *Linux* is part of my implementation, therefore the *Linux* sources, which by definition aren't used elsewhere, can assume they are part of said implementation. The 'set of software' does *not* include any OS. Not Windows, not Linux, not MacOSX, since the whole thing might be directed at a lowly microcontroller, which DOES NOT HAVE ANY OPERATING SYSTEM WHATSOEVER. Nevertheless, gcc works fine. This is unrelated and off topic. Heck, you've even consented above that Linux can use Without the kernel userspace wouldn't have anything, because anything syscall-related (which is basically everything) involves the kernel. Sure. The same goes for every other program. However, it would be pretty stoopid to say the kernel is an integral part of (say) the Gimp . More so, since the Gimp and GCC run on completely different architectures aswell. By the same token, linux is part of XFree86 despite the fact XFree86 does not require linux to run. But an XFree86 binary compiled on FreeBSD, or a GIMP binary compiled on FreeBSD, for the most part, will not run on Linux, because the compiler uses the _Linux_ environment to build the binary, including the _Linux_ headers and such. The built binary is nearly useless without Linux, but not vice-versa, hence even though the binary is not a derivative work of linux, it requires it to run. Heck, the kernel and its ABI is _more_ a part of the implementation than glibc is! I can write an assembly program that doesn't link to or use libc, but without using syscalls I can do nothing whatsoever. I can write entire applications using gcc without even thinking of using any 'syscall' or any other part of linux/bsd/whatever. Still... it's gcc. Uhh, what exactly is your application going to do? So
Re: RFC: turn kmalloc+memset(,0,) into kcalloc
On Apr 06, 2005, at 11:50, Paulo Marques wrote: kzalloc it is, then. [...] So we gain 8kB on the uncompressed image and 1347 bytes on the compressed one. This was just a dumb test and actual results might be better due to smarter human cleanups. Not a spectacular gain per se, but the increase in code readability is still worth it, IMHO. Perhaps this could eventually be modified to draw from a prezeroed block of memory, similar to the current code for doing the same thing for userspace. It probably wouldn't give much performance gain, especially since it's not used for large blocks or large numbers of small objects (As you would use a slabcache for those), but it might help a bit. Of course, the code would need to fall back quickly if such an allocation would be messy or expensive for any reason. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Long OT] Re: non-free firmware in kernel modules, aggregation and unclear copyright notice.
This thread should probably get moved off-list soon, it's like beating the dead horse long after its flesh has decayed and its bones disintegrated to dust. On Apr 13, 2005, at 21:54, David Schwartz wrote: On Tue, Apr 12, 2005 at 12:05:59PM -0700, David Schwartz wrote: Yes, the GPL can give you rights you wouldn't otherwise have. A EULA can take away rights you would otherwise have. What compels you to agree with an EULA? If you do not agree with the EULA, you cannot and do not acquire lawful possession of the work. Of course, one could always assert the following: 1) I went to a store 2) I found a box 3) I went to the cash register 4) I gave money to the cashier for the box 5) I took the box home 6) I opened the box and took out the contents Now, to the end user, the above is the same procedure for purchasing a box of cereal or a piece of software, therefore the restrictions are the same. I'm not allowed to distribute the copyrightable materials, which for a cereal box is the images on the box, and for a CD is the digital data stored therein. Other than that, I can take a hammer and smash my CD/cereal, I can make a dozen copies of the CD/box-art and mount them on the wall or burn them, both of which are symbolic speech. I can make backup copies of my cereal box-art/CD too. At what point of the above did I agree to any license? As far as I know, a license (IE: contract) is not valid for a product unless made at the point-of-sale, before exchanging money. This is especially valid since almost all computer retailers refuse refunds for opened software. When you have to open the box to see the license, that's bad, but when, as I've seen far too many times, you have to break the seal and insert the CD to even _see_ the license, it cannot be valid. The only real point of most of the EULAs is to protect the owners copyright, which is implicitly protected in any case. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SCSI opcode 0x80 and 3ware Escalade 7000 ATA RAID
"0\n", 2) = 2 write(1, "201 Soft_Read_Error_Rate", 28) = 28 write(1, "0x000a 253 251 000Old_"..., 59) = 59 write(1, "5\n", 2) = 2 write(1, "202 TA_Increase_Count ", 28) = 28 write(1, "0x000a 253 252 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "203 Run_Out_Cancel ", 28) = 28 write(1, "0x000b 253 252 180Pre-"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "204 Shock_Count_Write_Opern ", 28) = 28 write(1, "0x000a 253 252 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "205 Shock_Rate_Write_Opern ", 28) = 28 write(1, "0x000a 253 252 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "207 Spin_High_Current ", 28) = 28 write(1, "0x002a 252 252 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "208 Spin_Buzz ", 28) = 28 write(1, "0x002a 252 252 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "209 Offline_Seek_Performnce ", 28) = 28 write(1, "0x0024 196 191 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, " 99 Unknown_Attribute ", 28) = 28 write(1, "0x0004 253 253 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "100 Unknown_Attribute ", 28) = 28 write(1, "0x0004 253 253 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "101 Unknown_Attribute ", 28) = 28 write(1, "0x0004 253 253 000Old_"..., 59) = 59 write(1, "0\n", 2) = 2 write(1, "\n", 1) = 1 ioctl(3, FIBMAP, 0xbfffe290)= 0 write(1, "SMART Error Log Version: 1\n", 27) = 27 write(1, "ATA Error Count: 1\n", 19)= 19 write(1, "\tCR = Command Register [HEX]\n\tFR"..., 490) = 490 write(1, "Error 1 occurred at disk power-o"..., 77) = 77 write(1, " When the command that caused t"..., 88) = 88 write(1, " After command completion occur"..., 121) = 121 write(1, " Error: UNC 3 sectors at LBA = "..., 51) = 51 write(1, "\n\n", 2) = 2 write(1, " Commands leading to the comman"..., 194) = 194 write(1, " 25 00 08 bb bc 0c e0 00 23d+0"..., 58) = 58 write(1, " 25 00 10 83 bc 0c e0 00 23d+0"..., 58) = 58 write(1, " 25 00 08 3b bc 0c e0 00 23d+0"..., 58) = 58 write(1, " 25 00 08 33 bc 0c e0 00 23d+0"..., 58) = 58 write(1, " 25 00 08 13 bc 0c e0 00 23d+0"..., 58) = 58 write(1, "\n", 1) = 1 ioctl(3, FIBMAP, 0xbfffe2a0)= 0 write(1, "SMART Self-test log structure re"..., 48) = 48 write(1, "Num Test_DescriptionStatus "..., 96) = 96 write(1, "# 1 Short offline Complet"..., 79) = 79 write(1, "# 2 Short offline Complet"..., 79) = 79 write(1, "# 3 Short offline Complet"..., 79) = 79 write(1, "# 4 Short offline Complet"..., 79) = 79 write(1, "# 5 Short offline Complet"..., 79) = 79 write(1, "# 6 Extended offlineComplet"..., 79) = 79 write(1, "# 7 Short offline Complet"..., 79) = 79 write(1, "# 8 Short offline Complet"..., 79) = 79 write(1, "# 9 Short offline Complet"..., 79) = 79 write(1, "#10 Short offline Complet"..., 79) = 79 write(1, "#11 Short offline Complet"..., 79) = 79 write(1, "#12 Short offline Complet"..., 79) = 79 write(1, "#13 Extended offlineComplet"..., 79) = 79 write(1, "#14 Short offline Complet"..., 79) = 79 write(1, "#15 Short offline Complet"..., 79) = 79 write(1, "#16 Short offline Complet"..., 79) = 79 write(1, "#17 Short offline Complet"..., 79) = 79 write(1, "#18 Short offline Complet"..., 79) = 79 write(1, "#19 Short offline Complet"..., 79) = 79 write(1, "#20 Extended offlineComplet"..., 79) = 79 write(1, "#21 Short offline Complet"..., 79) = 79 write(1, "\n", 1) = 1 ioctl(3, FIBMAP, 0xbfffe290)= 0 write(1, "SMART Selective self-test log da"..., 63) = 63 write(1, " SPAN MIN_LBA MAX_LBA CURRENT"..., 45) = 45 write(1, "100 Not_tes"..., 37) = 37 write(1, "200 Not_tes"..., 37) = 37 write(1, "300 Not_tes"..., 37) = 37 write(1, "400 Not_tes"..., 37) = 37 write(1, "500 Not_tes"..., 37) = 37 write(1, "Selective self-test flags (0x0):"..., 33) = 33 write(1, " After scanning selected spans,"..., 69) = 69 write(1, "If Selective self-test is pendin"..., 76) = 76 write(1, "\n", 1) = 1 munmap(0x40018000, 4096)= 0 exit_group(64) = ? Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI opcode 0x80 and 3ware Escalade 7000 ATA RAID
On Apr 15, 2005, at 18:50, adam radford wrote: Make sure you are are using the 3ware character ioctl interface at /dev/twe0 (dynamic major, controller number minor) for your smartmontools, not /dev/sda. Hmm, I don't have any /dev/twe* here. I _do_ have hotplug, udev, etc, installed, and this is a 2.6 machine, so I'm not sure what could be wrong. How recent was this change? The old interface from smartmontools used SCSI_IOCTL_SEND_COMMAND ioctls with a special passthru opcode of 0x80 that would get passed to the driver. This interface is deprecated in the driver and the kernel. Ok. Now if only I could find it. Is there anyplace in sysfs that I can check manually to see what the dynamic major is? I'd like to try creating the device by hand if I can't get Debian hotplug to see it. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel guide to space
On Jul 13, 2005, at 21:12:08, [EMAIL PROTECTED] wrote: I don't think there's a strict 80 column rule anymore. It's 2005... Think again. There are a lot of people who use 80 column windows so that we can see two code windows side-by-side. Agreed. If you're having trouble with width, it's a sign that the code needs to be refactored. Also, my personal rule is if that a source file exceeds 1000 lines, start looking for a way to split it. It can go longer (indeed, there is little reason to split the fs/nls/nls_cp9??.c files), but (I will refrain from discussing drivers/scsi/advansys.c) A simple set of code refactoring rules that I try to abide by: 1) If a function is more than a few 25 or 40 line screens, it's likely too big (unless a big switch statement or a list of initialization calls or something). If necessary, use static inline functions to factor out repetitive behavior. 2) If a file is more than 30-40 functions, it's likely too big, and you should try to split it. It's _ok_ to have 4 source files implementing code for manipulating a single struct. 3) If a normal line of code is more than 80 characters, one of the following is probably true: you need to break the line up and use temps for clarity, or your function is so big that you're tabbing over too far. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ ++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel guide to space
On Jul 20, 2005, at 20:45:21, Paul Jackson wrote: drivers/scsi/BusLogic.c: %2d %5d %5d %5d%5d %5d %5d %5d %5d %5d\n", TargetID, TargetStatistics[TargetID].CommandAbortsRequested, TargetStatistics[TargetID].CommandAbortsAttempted, TargetStatistics [TargetID].CommandAbortsCompleted, TargetStatistics [TargetID].BusDeviceResetsRequested, TargetStatistics [TargetID].BusDeviceResetsAttempted, TargetStatistics [TargetID].BusDeviceResetsCompleted, TargetStatistics [TargetID].HostAdapterResetsRequested, TargetStatistics [TargetID].HostAdapterResetsAttempted, TargetStatistics [TargetID].HostAdapterResetsCompleted); Ugh!!! From CodingStyle (although this is not always followed): The limit on the length of lines is 80 columns and this is a hard limit. Statements longer than 80 columns will be broken into sensible chunks. Descendants are always substantially shorter than the parent and are placed substantially to the right. The same applies to function headers with a long argument list. Long strings are as well broken into shorter strings. [example relevant to the above code snipped] Also: C is a Spartan language, and so should your naming be. Unlike Modula-2 and Pascal programmers, C programmers do not use cute names like ThisVariableIsATemporaryCounter. A C programmer would call that variable "tmp", which is much easier to write, and not the least more difficult to understand. [...] mixed-case names are frowned upon [...] *cough* TargetStatistics[TargetID].HostAdapterResetsCompleted *cough* I suspect linus would be willing to accept a few cleanup patches for the BusLogic.c file. Perhaps even one that renames BusLogic.c to buslogic.c like all the other files in the source tree, instead of using nasty StudlyCaps all over :-D Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ ++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression: radeonfb: No synchronisation on CRT with linux-2.6.13-rc5
On Aug 7, 2005, at 03:51:07, Benjamin Herrenschmidt wrote: On Fri, 2005-08-05 at 19:38 +0200, Bodo Eggert wrote: On Fri, 5 Aug 2005, Benjamin Herrenschmidt wrote: On Fri, 2005-08-05 at 00:03 +0200, Bodo Eggert wrote: My CRT is out of sync after radeonfb from 2.6.13-rc5 is initialized. 2.6.12 does not show this behaviour. I'm out of town at the moment, could you maybe diff radeonfb between working & non-working and CC me the diff ? I don't have my work stuff at hand not my kernel images so... There were no changes in radeonfb.c, but I could trace to to CONFIG_PREEMPT. With _NONE, it works as expected. Ah ! Interesting... I don't see why PREEMPT would affect radeonfb though ... Can you try something like wrapper radeon_write_mode() with preempt_disable()/preempt_enable() and tell me if it makes a difference ? I'm having a similar issue with my shiny new 17" Powerbook G4. The radeon chip works fine with framebuffer in 2.6.12.4 _with_ PREEMPT, but not in 2.6.13-rc5 _with_ PREEMPT (configs are virtually identical). I'll try your idea this afternoon when I get the chance. I wonder if perhaps some code in radeonfb is used under the BKL, which is now preemptable (Or maybe an ordinary spinlock changed or went away?), because I also set PREEMPT_BKL. I've got an LCD, and on mine it looks like every third pixel-line gets shifted about 32-64 pixels to the left, and they move with display refresh. My guess is that something is interrupting radeonfb during a critical time in display syncing and forcing the video card to wait too far into the next line before sending pixels. One other data point, I've seen something like this, except not nearly as bad, is stock debian 2.6.8 vs. stock debian 2.6.11 on powerpc. The former exhibits some similar (but not nearly as bad) symptoms. (Same Powerbook), whereas 2.6.11 doesn't. In that case, neither has PREEMPT. I'll run more tests this afternoon/evening, to try to track it down. Cheers, Kyle Moffett -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression: radeonfb: No synchronisation on CRT with linux-2.6.13-rc5
On Aug 7, 2005, at 12:13:38, Benjamin Herrenschmidt wrote: I've got an LCD, and on mine it looks like every third pixel-line gets shifted about 32-64 pixels to the left, and they move with display refresh. My guess is that something is interrupting radeonfb during a critical time in display syncing and forcing the video card to wait too far into the next line before sending pixels. radeonfb is mostly inactive after it has setup the framebuffer and unless you actually draw something, in which case, accel code is called. _However_ there is an unrelated problem with some panels, including some of the 17": The panel doesn't always "sync" properly. This seem to be related to some subtle timing issue in the LVDS code but I don't know exactly what yet. You can usually get it back by repeately turning the backlight all the way down (which shuts the panel off) and back up until it "catches". Hmm. This doesn't really fit as my issues are very reproducible. The behaviour under stock Debian 2.6.8 is identical during reboots and after fblevel 0 ; sleep X ; fblevel 15. Likewise, stock 2.6.11, 2.6.12.4, and 2.6.13-rc5, although I'm just getting back to testing things. Cheers, Kyle Moffett -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
On Aug 7, 2005, at 19:51:25, Con Kolivas wrote: On Mon, 8 Aug 2005 02:58, Srivatsa Vaddagiri wrote: Con, I am afraid until SMP correctness is resolved, then this is not in a position to go in (unless you want to enable it only for UP, which I think should not be our target). I am working on making this work correctly on SMP systems. Hopefully I will post a patch soon. Great! I wasn't sure what time frame you meant when you last posted. I won't do anything more, leaving this patch as it is, and pass the baton to you. I'm curious what has happened to the PPC side of the patch. IIRC, someone was working on such a port, but it seems to have been lost along the way at some point. Is there any additional information on that patch? Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression: radeonfb: No synchronisation on CRT with linux-2.6.13-rc5
On Aug 7, 2005, at 21:13:54, Kyle Moffett wrote: On Aug 7, 2005, at 12:13:38, Benjamin Herrenschmidt wrote: _However_ there is an unrelated problem with some panels, including some of the 17": The panel doesn't always "sync" properly. This seem to be related to some subtle timing issue in the LVDS code but I don't know exactly what yet. You can usually get it back by repeately turning the backlight all the way down (which shuts the panel off) and back up until it "catches". Hmm. This doesn't really fit as my issues are very reproducible. The behaviour under stock Debian 2.6.8 is identical during reboots and after fblevel 0 ; sleep X ; fblevel 15. Likewise, stock 2.6.11, 2.6.12.4, and 2.6.13-rc5, although I'm just getting back to testing things. Damn. As soon as I say this, I go back and am completely unable to make 2.6.13-rc5 reproduce the issue. *grumble* black magic *grumble* :-D. Cheers, Kyle Moffett -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wireless support
On Aug 9, 2005, at 05:09:55, Jochen Friedrich wrote: Third, both ndiswrapper and binary-only drivers only work on one platform. E.g. broadcom has a binary-only driver for their WLAN card on Linux, but only for mipsel (wrt54g). On Alpha or PowerPC, most WLAN equipment doesn't work under Linux, at all. Definitely. I want my Airport Extreme to work! Many users of the BCM4301 chip can get it to work (kinda) with Linux via ndiswrapper, but that means they are much less likely to participate in any kind of reverse engineering effort, even if it's just testing a new driver. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ ++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: understanding Linux capabilities brokenness
On Aug 9, 2005, at 11:16:33, Christopher Warner wrote: In my observer pragmatic view; yes. On many occasion, i've come to CAP calls only to be frustrated with the sheer disconnect of it all. It simply doesn't work. If it means having to break posix conformance for a working implementation. Then so be it. On Tue, 2005-08-09 at 00:46 -0400, James Morris wrote: Let me play the Devil's advocate here. Should we be thinking about deprecating and removing capabilities from Linux? One brief suggestion: A key/token interface was recently introduced that might be useful to allow a simple new inheritance model for "capabilities", "roles", "rootperms" or whatever other abstraction you create. Cheers, Kyle Moffett -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/PATCH] Add pci_walk_bus function to PCI core
On Aug 10, 2005, at 02:10:49, Arjan van de Ven wrote: On Wed, 2005-08-10 at 11:36 +1000, Paul Mackerras wrote: Greg, Any comments on this patch? Would you be amenable to it going in post 2.6.13? The PCI error recovery infrastructure needs to be able to contact all the drivers affected by a PCI error event, which may mean traversing all the devices under a given PCI-PCI bridge. This patch adds a function to the PCI core that traverses all the PCI devices on a PCI bus and under any PCI-PCI bridges on that bus (recursively), calling a given function for each device. is there a way to avoid the recursion somehow? Recursion is "not fun" stack usage wise, esp if you have really deep hierarchies Hmm, it looks like PCI error recovery wants breadth-first recursion, so you should be able to do some sort of tail-recursion or something. If only one error-recovery action on a given subtree can be going at a time, you should be able to add an "error_recovery" linked-list to the device structure and do something like this: void recover(...) { struct list_head recovery_list = LIST_HEAD_INIT(recovery_list); list_add(&dev->error_recovery, &recovery_list); while(!list_empty(&recovery_list)) { struct some_device_type *dev = list_entry(recovery_list->next, struct some_device_type, error_recovery); dev->some_recovery_function(dev, [...]); list_del(&dev->error_recovery); } } Then each PCI-PCI bridge's some_recovery_function could do this: void some_recovery_function(struct some_device_type *dev, [...]) { struct some_device_type *child; actually_do_my_recovery(); list_for_each_entry(child, dev->some_pci_subdev_list, some_pci_list) { if (needs_recovery(child)) list_add_tail(&child->error_recovery,&dev->error_recovery); } } With such an arrangement, the callstack is as shallow as possible: recover some_recovery_function actually_do_my_recovery needs_recovery childs_recovery_function [...] If you can have multiple simultaneous error-recovery actions per subtree, that wouldn't properly work unless they were exclusive-blocking, IE: an error recovery action triggers an error on a subtree which must recover itself. In that case, with some extra state saved in the recover function and passed to the "some_recovery_function", you could allow the other recovery to continue before resuming. If you can have two CPUs recovering the same device tree, I'd be inclined to wonder what kind of strange errors you're causing on the PCI bus :-D, and I'd be interested in an example of how that could work in any sane way. Cheers, Kyle Moffett -- Premature optimization is the root of all evil in programming -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: [PATCH 00/14] GFS
On Aug 10, 2005, at 09:26:26, AJ Lewis wrote: On Wed, Aug 10, 2005 at 12:11:10PM +0100, Christoph Hellwig wrote: On Wed, Aug 10, 2005 at 01:09:17PM +0200, Lars Marowsky-Bree wrote: So for every directory hierarchy on a shared filesystem, each user needs to have the complete list of bindmounts needed, and automatically resync that across all nodes when a new one is added or removed? And then have that executed by root, because a regular user can't? Do it in an initscripts and let users simply not do it, they shouldn't even know what kind of filesystem they are on. I'm just thinking of a 100-node cluster that has different mounts on different nodes, and trying to update the bind mounts in a sane and efficient manner without clobbering the various mount setups. Ouch. How about something like the following: cpslink() => Create a Context Dependent Symlink readcpslink() => Return the Context Dependent path data readlink() => Return the path of the Context Dependent Symlink as it would be evaluated in the current context, basically as a normal symlink. lstat()=> Return information on the Context Dependent Symlink in the same format as a regular symlink. unlink() => Delete the Context Dependent Symlink. You would need an extra userspace tool that understands cpslink/ readcpslink to create and get information on the links for now, but ls and ln could eventually be updated, and until then the would provide sane behavior. Perhaps this should be extended into a new API for some of the strange things several filesystems want to do in the VFS: extlink() => Create an extended filesystem link (with type specified) readextlink() => Return the path (and type) for the link The filesystem could define how each type of link acts with respect to other syscalls. OpenAFS could use extlink() instead of their symlink magic for adjusting the AFS volume hierarchy. The new in-kernel AFS client could use it in similar fashion (It has no method to adjust hierarchy, because it's still read-only). GFS could use it for their Context Dependent Symlinks. Since it would pass the type in as well, it would be possible to use it for different kinds of links on the same filesystem. Cheers, Kyle Moffett -- Simple things should be simple and complex things should be possible -- Alan Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CCITT-CRC16 in kernel
On Aug 11, 2005, at 11:19:59, linux-os (Dick Johnson) wrote: On Thu, 11 Aug 2005 [EMAIL PROTECTED] wrote: You're wrong in two ways: 1) You've got CRC-16 and CRC-CCITT mixed up, and 2) You've got the bit ordering backwards. Remember, I said very clearly, the lsbit is the first bit, and the first bit is the highest power of x. You can reverse the convention and still have a CRC, but that's not the way it's usually done and it's more awkward in software. CRC-CCITT = X^16 + X^12 + X^5 + X^0 = 0x8408, and NOT 0x1021 CRC-16 = X^16 + X^15 + X^2 + X^0 = 0xa001, and NOT 0x8005 Thank you very much for your time, but what you say is completely different than anything else I have found on the net. Do the math: 2^ 16 = 65536 2^ 12 = 4096 2^ 5 =32 2^ 0 = 1 -- 69655 = 0x11021 No, it's like this: first, the 16 term is ignored, then: 2^ ( 15 - 12 ) = 2^ 3 = 8 = 0x0008 2^ ( 15 - 5 ) = 2^ 10 = 1024 = 0x0400 2^ ( 15 - 0 ) = 2^ 15 = 32768 = 0x8000 --- = 0x8408 This has 2 things: 1) The least-significant bit is the first bit 2) The first bit is the _highest_ power of X. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ ++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CCITT-CRC16 in kernel
On Aug 11, 2005, at 13:08:56, linux-os (Dick Johnson) wrote: Okay. Thanks. This means that hardware somehow swapped bits before doing a CRC. I wasn't aware that this was even possible as it would require additional storage, well I guess anything is now possible in a FPGA. The "Bible" has been: http://www.joegeluso.com/software/articles/ccitt.htm Note that on the very first page, reference, is made to the 0x1021 poly. Then there is source-code that is entirely incompatible with anything in the kernel, but is supposed to work (it does work on my hardware). I have spent over a week grabbing everything on the Web that could help decipher the CCITT CRC and they all show this same kind of code and same kind of organization. Nothing I could find on the Web is like the linux kernel ccitt_crc. Go figure. Do you suppose it was bit-swapped to bypass a patent? It could be that, or it could be some kernel genius figured out that one method is faster or better or more magical than the other on most platforms. Since the code works well, I would be disinclined to tinker with it. :-D. Cheers, Kyle Moffett -- Q: Why do programmers confuse Halloween and Christmas? A: Because OCT 31 == DEC 25. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wireless support
On Aug 11, 2005, at 23:17:07, Lee Revell wrote: On Fri, 2005-08-12 at 12:59 +1000, roucaries bastien wrote: They post on this list 1 year and a half ago no answer. I guess everyone on LKML has day jobs now, no one has time for fun stuff like reverse engineering drivers anymore... :-( Much as I would love to help, I'm usually buried under schoolwork. In any case, I really have to admire the people behind the project, translating tens of thousands of MIPS assembly instructions to C, documenting the C, then giving the documentation to somebody else to write the driver even though by that point you could write it backwards in a blindfold, that has _got_ to be hard and frustrating work. Cheers, Kyle Moffett -- Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shultz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Aug 13, 2005, at 20:57:45, Alan Cox wrote: I have "setxkbmap -symbols 'en_US(pc102)+gb'" in my ~/.xsession, and « and » are available as AltGr-z and AltGr-x respectively. Most keyboards don't have an AltGr key. You must be an American. Most old the worlds keyboards have an AltGr key. You'll find that US keyboards have two alt keys to avoid confusing people (like one button mice ;)) but the right one is understood by the X bindings to be "AltGr". Even though the US keyboard is apparently lacking functionality its purely a text label issue And those of us who are Mac OS X oriented have patched our console and X keycodes to match the mac way of generating symbols: Alt-\= « Alt-Shift-\ = » Alt-Shift-+ = ± If only someone could come up with a good character palette like exists on that OS, something that could generate a wide variety of keysyms, preferably all of UTF-8, and send them to the topmost window. Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] [PATCH] dynamic-tick patch modified for SMP
On Aug 13, 2005, at 20:18:28, Con Kolivas wrote: It does seems there are some timing issues with this patch, although it is also quite stable (up for 10 hours now). I've had a few interesting messages in my syslog suggesting problems: Hangcheck: hangcheck value past margin! and then later on a few of: set_rtc_mmss: can't update from 0 to 59 It may be a good idea to rebase this patch off the new generic time- keeping subsystem that John Stultz is working on. He's cleaned up much of the code relating to system time processing, which may make it easier to get it right when skipping ticks (IE: You probably don't need to do anything special to replay missed ticks, the new timer code automatically handles it for you). There is an excellent LWN article on his project here: http://lwn.net/Articles/120850/ Cheers, Kyle Moffett -- Simple things should be simple and complex things should be possible -- Alan Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On Aug 14, 2005, at 02:18:13, Jason L Tibbitts III wrote: "LR" == Lee Revell <[EMAIL PROTECTED]> writes: LR> Is Larry smoking crack? From the Perl6-Bible: http://search.cpan.org/dist/Perl6-Bible/lib/ Perl6/Bible/S03.pod: I think this confirms that the answer is yes. See the following at the above URL: Note that ?^ is functionally identical to !.?| differs from || in that ?| always returns a standard boolean value (either 1 or 0), whereas || returns the actual value of the first of its arguments that is true. Since when is the string "!.?|" an operator??? Or "?^", "+|", "~|", "?|", etc. I think Larry's gone off the deep end on this one. It may be an incredibly powerful and expressive language, but it seems _really_ strange, and probably will produce the best Obfuscated-code contest the world has ever seen. (Better even than the Perl5 one). Cheers, Kyle Moffett -- Simple things should be simple and complex things should be possible -- Alan Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix PPC signal handling of NODEFER, should not affect sa_mask
On Aug 12, 2005, at 17:53:53, Steven Rostedt wrote: Two more systems that are different from Linux. So far, Linux is the odd ball out. Make that three more systems (Mac OS X has the same behavior as the BSDs): zeus:~ kyle$ uname -a Darwin zeus.moffetthome.net 8.2.0 Darwin Kernel Version 8.2.0: Fri Jun 24 17:46:54 PDT 2005; root:xnu-792.2.4.obj~3/RELEASE_PPC Power Macintosh powerpc zeus:~ kyle$ ./test_signal sa_mask blocks other signals SA_NODEFER does not block other signals SA_NODEFER does not affect sa_mask SA_NODEFER and sa_mask blocks sig !SA_NODEFER blocks sig SA_NODEFER does not block sig sa_mask blocks sig Cheers, Kyle Moffett -- Q: Why do programmers confuse Halloween and Christmas? A: Because OCT 31 == DEC 25. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support
On Aug 15, 2005, at 16:05:22, Doug Warzecha wrote: This patch adds the Dell Systems Management Base Driver with sysfs support. +On some Dell systems, systems management software must access certain +management information via a system management interrupt (SMI). The SMI data +buffer must reside in 32-bit address space, and the physical address of the +buffer is required for the SMI. The driver maintains the memory required for +the SMI and provides a way for the application to generate the SMI. +The driver creates the following sysfs entries for systems management +software to perform these system management interrupts: Why can't you just implement the system management actions in the kernel driver? This is tantamount to a binary SMI hook to userspace. What functionality does this provide on a dell system from an administrator's point of view? +Host Control Action + +Dell OpenManage supports a host control feature that allows the administrator +to perform a power cycle or power off of the system after the OS has finished +shutting down. On some Dell systems, this host control feature requires that +a driver perform a SMI after the OS has finished shutting down. + +The driver creates the following sysfs entries for systems management software +to schedule the driver to perform a power cycle or power off host control +action after the system has finished shutting down: + +/sys/devices/platform/dcdbas/host_control_action +/sys/devices/platform/dcdbas/host_control_smi_type +/sys/devices/platform/dcdbas/host_control_on_shutdown How is this different from shutdown() or reboot()? What exactly is smi_type used for? Please provide better documentation on how to use this and what it does. If this is supposed to be used with the RBU code to trigger a BIOS update, then why not integrate it into one kernel driver that receives firmware, loads it into the BIOS, and properly resets the machine at powerdown? I think PowerPC does a similar thing with OpenFirmware flash memory. When I change the default boot device or other firmware environment, I get a message from the kernel upon shutdown: Erasing flash bank 1... Writing flash bank 1... Would not a similar system work for Dell? It would be far simpler to use than the current mess of patches you've proposed. If done properly, I could even do this: cat firmware-with-checksum.img >/sys/devices/platform/dellbios/ firmware_upgrade Then an ordinary system reboot or shutdown would automatically use the SMI and host-control-action to upgrade the firmware and shutdown or reboot, instead of the normal ACPI shutdown and reboot code. Cheers, Kyle Moffett -- Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shultz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support
On Aug 15, 2005, at 18:58:56, [EMAIL PROTECTED] wrote: Why can't you just implement the system management actions in the kernel driver? This is tantamount to a binary SMI hook to userspace. What functionality does this provide on a dell system from an administrator's point of view? Kyle, I'm sure that not everybody agrees with the whole concept of SMI calls. Nevertheless, these calls exist, and in order to have a complete systems-management solution, we have to provide a way to do SMI calls. Now, we have developed a way to do these SMI calls from userspace without kernel support, but we are trying to be community-friendly and show our hooks in the open, rather than trying to sneak them in under the covers. You might not like the concept of a generic hook for SMI calls in the kernel, but the alternatives are hardly better. One alternative is the already-mentioned method that we do things under the covers in userspace. Another alternative is that we write separate kernel code for each and every SMI call that exists in the Dell BIOS. The second alternative is not entirely feasible. We have over 60 SMI functions, and we would have to write a kernel-mode wrapper for each and every one. I hope you agree that code that doesn't exist is less buggy than code that is, and that code that is in userspace is a whole lot less likely to cause a kernel crash than code that is in the kernel. I think the second alternative is actually feasible and preferable. The point of the kernel is to provide safe and secure access to two things: 1) Hardware through an abstraction layer 2) Software services (like IP stack) that are not feasible to do in userspace. A system that just provides a hunk of DMA RAM and the ability to generate interrupts is definitely not 2, and does not really follow the ideal behind 1 either. I gave the firmware example earlier. There are several devices that provide access to update firmware by reading and writing a firmware file directly in sysfs, then updating it on reboot if necessary. We are trying to keep our kernel bloat down. We don't really think that customers of IBM or HP really want their Red Hat kernels loaded down with a bunch of Dell-only code. That's what kconfig is for. My G4 Powerbook doesn't have support for hardware found in my G4 desktop any more than an IBM box should be forced to have support for Dell hardware, yet all platforms work fine from the same kernel tree. Additionally, we are releasing an open source library (GPL/OSL dual license) that can use these hooks to perform many systems management functions in userspace. See http://linux.dell.com/libsmbios/main/. We should have code in libsmbios to do SMI using this driver within about two weeks. We currently writing the SMI hooks in libsmbios using this posted version of the driver. I am the maintainer of this project, and it is my goal to have code in libsmbios for every Dell SMI call. That's a nice project. I applaud Dell for it's openness, but that's not the only issue here, the kernel needs good engineering too. I would suggest that you try to implement as much as is possible in a kernel driver. Firmware loading support, for example, or hardware sensors, should integrate well into sysfs and be accessible through existing tools if possible. Doug also mentions fan status and control in his mail. Could you provide such access through existing fan status/control interfaces so that existing tools work as well? We would welcome feedback on a better way to implement this driver in the kernel, but the fact remains that we have to have a way to do this, and we are open-sourcing all of the code necessary to get this done. Thank you for your effort. You guys have made significant progress, but IMHO, you've still got a ways to go. Keep up the good work, though! Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support
On Aug 15, 2005, at 19:38:49, Doug Warzecha wrote: On Mon, Aug 15, 2005 at 04:23:37PM -0400, Kyle Moffett wrote: Why can't you just implement the system management actions in the kernel driver? We want to minimize the amount of code in the kernel and avoid having to update the driver each time a new system management command is added. One of the recent trends in kernel driver development is to make as much as possible accessible through standard tools (like with echo and cat via sysfs). The libsmbios project is being updated to use this code. http:// linux.dell.com/libsmbios/main/. Using the libsmbios code, you will be able to set all of the options in BIOS F2 screen from Linux userspace. Also, libsmbios is looking at implementing a few other things like fan status. Libsmbios is 100% open-source (OSL/GPL dual license). From my point of view, this driver could use sysfs almost entirely and put all of the hardware-manipulation code completely in kernel space, along with the hardware detection code. You could have plain-text files in /sys/bus/platform/dellbios/ that have all of the BIOS F2 options accessible to the admin from the command line, without special tools. (You could always add an extra program that presents a BIOS-like interface) The power cycle feature of the system powers off the system for a few seconds and then powers the system back on without user intervention. shutdown() and reboot() don't provide that feature. Please ensure that the code is only run on reboot (and maybe halt), but definitely not in the poweroff code. What exactly is smi_type used for? Please provide better documentation on how to use this and what it does. The method of generating a host control SMI is not exactly the same for each PowerEdge system listed in dcdbas.txt. host_control_smi_type tells the driver how to generate the host control SMI for the system in use. I'll update dcdbas.txt with the SMI type value associated with the systems listed in that file. This is an _excellent_ reason why more of this should be in the kernel. What happens if the wrong SMI is used? Shouldn't it be relatively easy for the kernel to determine the correct SMI itself? Thanks for your hard work! Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support
On Aug 16, 2005, at 00:34:51, Chris Wedgwood wrote: On Mon, Aug 15, 2005 at 04:23:37PM -0400, Kyle Moffett wrote: Why can't you just implement the system management actions in the kernel driver? Why put things in the kernel unless it's really needed? I'm not thrillied about the lack of userspace support for this driver but that still doesn't mean we need to shovel wads of crap into the kernel. I'm worried that it might be more of a mess in userspace than it could be if done properly in the kernel. Hardware drivers, especially for something as critical as the BIOS, should probably be done in-kernel. Look at the mess that X has become, it mmaps /dev/mem and pokes at the PCI busses directly. I just don't want an MSI-driver to become another /dev/mem. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ ++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6 1/2] New Syscall: get rlimits of any process
On Aug 16, 2005, at 13:34:34, Wieland Gmeiner wrote: On Sat, 2005-08-13 at 15:11 -0700, Greg KH wrote: On Fri, Aug 12, 2005 at 07:48:22PM +0200, Wieland Gmeiner wrote: @@ -294,3 +294,4 @@ ENTRY(sys_call_table) .long sys_inotify_init .long sys_inotify_add_watch .long sys_inotify_rm_watch +.long sys_getprlimit Please follow the proper kernel coding style when writing new kernel code... Hm, Documentation/CodingStyle suggests using descriptive names, so something like getrlimit(...)/getrlimit_per_process(pid_t pid, ...) would be more appropriate? I think he was commenting more on the code indentation and braces placement than any naming issue. There was also a good guide to kernel whitespace posted to the LKML a week or so ago, please check the archives and review that as well. I have one small comment on something you stated in your original mail: Otherwise some checking on the validity of the given pid is done and if the given process is found access is granted if - the calling process holds the CAP_SYS_RESOURCE capability or - the calling process uid equals the uid of the process whose rlimit is being read or - the calling process uid equals the suid of the process whose rlimit is being read or - the calling process euid equals the uid of the process whose rlimit is being read or - the calling process euid equals the suid of the process whose rlimit is being read I suggest that you revise this list to the following: If the calling process can ptrace the target process, then allow rlimits to be read and written such that the hard limits may not be raised unless one of the two processes possesses the CAP_SYS_RESOURCE capability ptrace implies the ability to execute arbitrary code in the given process, which means that even without this new function the calling process theoretically could obtain and set rlimits for that process anyways, subject to its own CAP_SYS_RESOURCE capability. Such a situation would guarantee that there are no new security holes, and would limit the number of inter-process access rules which kernel developers need to understand. I believe some simple Googling and grepping through the kernel code should reveal the necessary ptrace- related process checks. Cheers, Kyle Moffett -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug#321442: kernel-source-2.6.8: fails to compile on powerpc (drivers/ide/ppc/pmac.c)
On Aug 13, 2005, at 18:54:30, LT-P wrote: Le lun 08 aoû 2005 17:57:04 CEST, Horms <[EMAIL PROTECTED]> a écrit: Can you please enable BLK_DEV_IDEDMA_PCI and see if that resolves your problem. If it does, then the following patch should fix Kconfig so that BLK_DEV_IDEDMA_PCI needs to be enabled for BLK_DEV_IDE_PMAC to be enabled. It should patch cleanly against Debian's 2.6.8 and Linus' current Git tree. It seems to solve the problem, thanks. Sometimes, I feel like I am the only person in the world to compile the kernel on powerpc... :) Actually, I ran into this same bug a day or so ago when updating to 2.6.13-rc6, it's just I noticed the error, fixed my config, then recompiled and forgot about it completely until now :-D. Thanks for the bug report, though! Cheers, Kyle Moffett -- I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated. -- Poul Anderson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2.6.13-rc6] add Dell Systems Management Base Driver (dcdbas) with sysfs support
On Aug 17, 2005, at 01:33:00, Matt Domsch wrote: This is conceptually similar to how SCSI Generic (either /dev/sg or ioctl(SG_IO)) works (userspace passes in preformated SCSI CDBs and gets back the resultant CDBs and extended sense data). The sg driver doesn't look at the data being passed down to any great extent. It doesn't validate that the command will make sense to the end device. This is not true anymore. Recently the SG driver obtained a basic form of SCSI command checking to prohibit vendor commands from those processes without CAP_RAW_IO, even if said process had full access to the device node itself. Cheers, Kyle Moffett -- Simple things should be simple and complex things should be possible -- Alan Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why Ext2/3 needs immutable attribute?
On Apr 17, 2005, at 12:12, Xin Zhao wrote: Thanks for your reply. Yes. I know, with immutable, even root cannot modify sensitive files. What I am curious is if an intruder has root access, he may have many ways to turn off the immutable protection and modify files. So immutable is designed just to prevent a valid root from making silly mistakes? Xin But without the proper capability, root _can't_ change the immutable bit. Of course, that also applies to DAC checks too. Personally, I find the immutable bit most useful at preventing accidents. I have several scripts designed specifically to access the same file, and I want to prevent one of my admins from accidentally editing that file by hand. The best way is with a big comment in the file itself and the immutable bit. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More performance for the TCP stack by using additional hardware chip on NIC
On Apr 17, 2005, at 19:37, Horst von Brand wrote: Andreas Hartmann <[EMAIL PROTECTED]> said: Alacritech developed a new chip for NIC's (http://www.alacritech.com/html/tech_review.html), which makes it possible to take away the TCP stack from the host CPU. Therefore, the host CPU has more performance for the applications according Alacritech. This sounds interesting. This idea has been discussed around here a couple of times, and the consensus is that it is a bad idea: IP (and upper protocol) processing is not expensive, if done right, so this really doesn't buy much; this forces a particular interface to networking into the kernel, loosing flexibility that way is always bad; there is no access to futzing around in between (for example, for firewalling and such); and if the "hardware implementation" has bugs, you are screwed. What I think would be _much_ more useful is a generic low-power multi-proc MIPS/PPC system on a PCI card with a certain amount of RAM, etc that could be programmed at runtime by the master CPU. Then you lose none of the flexibility, it can be run in the same endian-mode as the host CPU, and it would allow you to program it for much more complicated DMA. You could do anything from linux software RAID, audio processing, encryption, TCP/IP stack acceleration, extra scatter-gather for your disk controller, etc. If it was low-cost, IE: cheaper than adding extra full-speed CPUs to the system, and using a decent bi-endian, vector-capable CPU (Like PPC), you might find that people will buy them for the flexibility. Such a thing might also be useful for the prezero folks, it could be used (when not otherwise occupied) for zeroing unused pages. Personally, I think I'd buy one or two just to tinker with them :-D. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Power consumption HZ100, HZ250, HZ1000: new numbers
On Jul 31, 2005, at 18:32:47, Pavel Machek wrote: and cpufreq is usefull to keep your desktop cold, too. But I don't want my desktop cold!!! That would ruin its usefulness as a 400W dorm space-heater!!! :-D *starts boinc client running in the background* Cheers, Kyle Moffett -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/14] GFS
On Aug 2, 2005, at 21:00:02, Hans Reiser wrote: Arjan van de Ven wrote: because reiser got merged before jbd. Next question. That is the wrong reason. We use our own journaling layer for the reason that Vivaldi used his own melody. I don't know anything about GFS, but expecting a filesystem author to use a journaling layer he does not want to is a bit arrogant. Now, if you got into details, and said jbd does X, Y and Z, and GFS does the same X and Y, and does not do Z as well as jbd, that would be a more serious comment. He might want to look at how reiser4 does wandering logs instead of using jbd. but I would never claim that for sure some other author should be expected to use it. and something like changing one's journaling system is not something to do just before a merge. I don't want to start another big reiser4 flamewar, but... "I don't know anything about Reiser4, but expecting a filesystem author to use a VFS layer he does not want to is a bit arrogant. Now, if you got into details, and said the linux VFS does X, Y, and Z, and Reiser4 does..." Do you see my point here? If every person who added new kernel code just wrote their own thing without checking to see if it had already been done before, then there would be a lot of poorly maintained code in the kernel. If a journalling layer already exists, _new_ journaled filesystems should either (A) use the layer as is, or (B) fix the layer so it has sufficient functionality for them to use, and submit patches. That way if somebody later says, "Ah, crap, there's a bug in the kernel journalling layer", and fixes it, there are not eight other filesystems with their own open-coded layers that need to be audited for similar mistakes. This is similar to why some kernel developers did not like the Reiser4 code, because it implemented some private layers that looked kinda like stuff the VFS should be doing (Again, I don't want to get into that argument again, I'm just bringing up the similarities to clarify _this_ particular point, as that one has been beaten to death enough already). Now the question for GFS is still a valid one; there might be reasons to not use it (which is fair enough) but if there's no real reason then using jdb sounds a lot better given it's maturity (and it is used by 2 filesystems in -mm already). Personally, I am of the opinion that if GFS cannot use jdb, the developers ought to clarify why it isn't useable, and possibly submit fixes to make it useful, so that others can share the benefits. Cheers, Kyle Moffett -- I lost interest in "blade servers" when I found they didn't throw knives at people who weren't supposed to be in your machine room. -- Anthony de Boer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Calling suspend() in halt/restart/shutdown -> not a good idea
On Aug 3, 2005, at 07:40:54, Benjamin Herrenschmidt wrote: I'd like to get rid of shutdown callback. Having two copies of code (one in callback, one in suspend) is ugly. Well, it's obviously not a good time for this. First, suspend and shutdown don't necessarily do the same thing, then it just doesn't work in practice. So either do it right completely or not at all, but 2.6.13 isn't the place for an half-assed hack that looks like a solution to you. One possible way to proceed might be to add a new callback that takes a pm_message_t: powerdown() If it exists, it would be called in both the suspend and shutdown paths, before the suspend() and shutdown() calls to that driver are made. As drivers are fixed to clean up and combine that code, they could put the merged result into the powerdown() function, and remove their suspend() and shutdown() functions. Cheers, Kyle Moffett -- I lost interest in "blade servers" when I found they didn't throw knives at people who weren't supposed to be in your machine room. -- Anthony de Boer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Why is kmem_bufctl_t different across platforms?
While exploring the asm-*/types.h files, I discovered that the type "kmem_bufctl_t" is differently defined across each platform, sometimes as a short, and sometimes as an int. The only file where it's used is mm/slab.c, and as far as I can tell, that file doesn't care at all, aside from preferring it to be a small-sized type. I found this comment: /* * kmem_bufctl_t: * * Bufctl's are used for linking objs within a slab * linked offsets. * * This implementation relies on "struct page" for locating the cache & * slab an object belongs to. * This allows the bufctl structure to be small (one int), but limits * the number of objects a slab (not a cache) can contain when off- slab * bufctls are used. The limit is the size of the largest general cache * that does not use off-slab slabs. * For 32bit archs with 4 kB pages, is this 56. * This is not serious, as it is only for large objects, when it is unwise * to have too many per slab. * Note: This limit can be raised by introducing a general cache whose size * is less than 512 (PAGE_SIZE<<3), but greater than 256. */ It appears to state that the max kmem_bufctl_t value is ~56 on most setups, although it could be higher with 64-bit or bigger pages. Since this value is never used by anything except that kernel-internal file, should it be unified across all architectures? If so, I'll send a patch to remove the various typedefs and introduce a single "typedef unsigned short kmem_bufctl_t" in include/linux/types.h Cheers, Kyle Moffett -- Premature optimization is the root of all evil in programming -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why is kmem_bufctl_t different across platforms?
On Aug 28, 2005, at 19:37:16, Adrian Bunk wrote: On Sun, Aug 28, 2005 at 02:55:03PM -0700, Andrew Morton wrote: Kyle Moffett <[EMAIL PROTECTED]> wrote: While exploring the asm-*/types.h files, I discovered that the type "kmem_bufctl_t" is differently defined across each platform, sometimes as a short, and sometimes as an int. The only file where it's used is mm/slab.c, and as far as I can tell, that file doesn't care at all, aside from preferring it to be a small-sized type. I don't think there's any good reason for this. -mm's slab-leak-detector.patch switches them all to unsigned long. What about moving it to include/linux/types.h ? Or, since it's _only_ used in mm/slab.c, why not put it in there? Here is a really simple patch that does just that: kmem_bufctl_t-consolidation.patch Description: Binary data Cheers, Kyle Moffett -- Q: Why do programmers confuse Halloween and Christmas? A: Because OCT 31 == DEC 25.
Re: Is cdrecord dependent on some kind of bus type?
On Aug 29, 2005, at 07:46:04, jeff shia wrote: Hello, Is cdrecord dependent on some kind of bus type,such as pci or usb? And the older version such as cdrecord-1.2? can cdrecord-1.2 run on kernel-2.4.18? Please ask these kinds of questions of the cdrecord mailing-list or the cdrecord author Jörg Schilling, instead of on this list (this is a kernel development list, as opposed to a linux-users list). Also, you sent duplicate copies of your message only hours apart. Please don't do this. Yes, we did get your message, but nobody replied to it because it was off-topic and indicated a complete lack of RTFM and STFW. Please go read the associated documentation before asking questions, and then ask them on the appropriate forum (if you still have questions). Here is a good document about asking good questions: http://www.catb.org/~esr/faqs/smart-questions.html Cheers, Kyle Moffett -- Premature optimization is the root of all evil in programming -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: inotify and IN_UNMOUNT-events
On Aug 30, 2005, at 23:33:27, Robert Love wrote: On Tue, 2005-08-30 at 21:46 +0200, Juergen Quade wrote: Playing around with inotify I have some problems to generate/receive IN_UNMOUNT-events (using a self written application and inotify_utils-0.25; kernel 2.6.13). Doing: - mount /dev/hda1 /mnt - add a watch to the path /mnt/ ("./inotify_test /mnt") - umount /mnt results in two events: 1. IN_DELETE_SELF (mask=0x0400) 2. IN_IGNORED (mask=0x8000) Any ideas? "/mnt" is not unmounted, stuff inside of it is. Watch, say, "/mnt/foo/bar" and when /dev/hda1 is unmounted, you will get an IN_UNMOUNT on the watch. I think this might work as well: # mount /dev/hda1 /mnt # ./inotify_test /mnt/. & # umount /mnt That should get the effect you are looking for Cheers, Kyle Moffett -- I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated. -- Poul Anderson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: APs from the Kernel Summit run Linux
On Aug 31, 2005, at 16:32:11, Vojtech Pavlik wrote: On Wed, Aug 31, 2005 at 08:53:19PM +0100, Russell King wrote: On Wed, Aug 31, 2005 at 12:55:12PM -0400, Mark Lord wrote: I'll try loading the works into another ARM system I have here, and see (1) if it runs as-is, and (2) what the disassembly shows. You can identify ARM code quite readily - look for a large number of 32-bit words naturally aligned and grouped together whose top nibble is 14 - ie 0xE... The top nibble is the conditional execution field, and 14 is "always". Didn't find that. Anyway: The first and third parts contain a repeating 7-byte sequence 81 40 20 10 08 04 02 near the beginning, while part 2 is padded with zeroes in the same place. That sequence is altered in the first and last repetitions, like this: 88 4020 1008 0402 81 4020 1008 0402 [...] 81 4020 1008 0402 81 4020 1008 04c2 The 4020 and 0402 look oddly symmetrical to me, but that could just be my imagination. I wrote a quick perl script to find the number of occurrences of 8-bit aligned sequences of 16-bits, for all 16-bit values. It has some interesting (and potentially useful) results. The script: http://zeus.moffetthome.net/~kyle/hexfreq The output: http://zeus.moffetthome.net/~kyle/dwl.hexmult Reprocessed output by frequency: http://zeus.moffetthome.net/~kyle/dwl.hexfreq Reprocessing command: dwl.hexfreq Cheers, Kyle Moffett -- Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shultz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] A more general timeout specification
On Sep 1, 2005, at 11:18:52, Roman Zippel wrote: On Thu, 1 Sep 2005, Joe Korty wrote: On Thu, Sep 01, 2005 at 11:19:51AM +0200, Roman Zippel wrote: You still didn't explain what's the point in choosing different clock sources for a _timeout_. Well, if CLOCK_REALTIME is set forward by a minute, timers & timeout specified against that clock will expire a minute earlier than expected. That just rather suggests that the pthread API is broken as usual. (No other possible user was mentioned so far.) How about a hypothetical time-based event daemon. I want to run some jobs every 10 minutes that the system is running (not off or suspended), I want to run other jobs every hour in real time, and if one such timer expires while suspended, I want to run it immediately to catch up. The first suggests CLOCK_MONOTONIC, and the second works better with CLOCK_REALTIME. So in practice it's easier to advance CLOCK_MONOTONIC/CLOCK_REALTIME equally and only apply time jumps to CLOCK_REALTIME. I thought that's what he said, but maybe I'm just confused :-D. Cheers, Kyle Moffett -- Premature optimization is the root of all evil in programming -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] Splitting out kernel<=>userspace ABI headers
A while ago there was a big discussion about splitting out the userspace-accessible portions of the kernel headers into a separate directory, "kabi", "kernel-abi", "linux-abi", or a half-dozen other suggestions. Linus sprinkled a bit of holy-penguin-pee on the idea, but nothing ever really happened after that. I have some available time at the moment, and I would be willing to undertake the task, but I would like a bit of guidance first, both from Linus/akpm/etc, and from the list in general, about a few initial issues I see from my initial attempts to sort through the mess: 1) There are a couple header files upon which almost everything else depends, among them {asm,linux}/{posix_,}types.h, which have some significant duplications. Many of the archs have weird sizes for those types to preserve some backwards-compatibility ABI, but nowhere does it explain if there are any type-size restrictions in general. I would propose that those headers be reorganized so that there are sane defaults for all the types in kabi/types.h, and archs that require different would #define exceptions in their kabi/arch-foo/types.h. This would allow new archs to start with a sane standard ABI before it becomes set in stone. 2) There is a bunch of stuff that would be _really_ useful in userspace programs as well, even though not kernel ABI, such as list.h, atomic.h (with a few archs modified due to privilege restrictions), etc. If there is interest, I would attempt to split off those headers into a kcore/kerncore/linuxcore/whatever inline header collection included in the linux distribution and installed as part of the kernel headers. 3) What names are preferable for the above? My personal preferences are "kabi" and "kcore", because those save the most typing for the sucker trying to do all this (IE: me), although if someone has good reasons otherwise, I'll listen. I realize this project is only slightly short of massive, however I do have a bunch of time and am willing to do the grunt work if enlightened as to the community desires. I have a few different semi-patches almost ready, and I can probably finish up a couple this weekend if I can figure out which way people want to go. One of the major challenges is that kernel files have historically kind of indiscriminately included asm/foo.h when they really meant linux/foo.h (See the types.h example), only to have it magically work because some other header already included linux/types.h anyways. If arch/driver/etc maintainers are willing to take patches to clean that up, I'll start with that and eventually get a decent set of kabi/* headers. Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 2, 2005, at 09:41:09, Erik Andersen wrote: On Thu Sep 01, 2005 at 11:00:16PM -0400, Kyle Moffett wrote: A while ago there was a big discussion about splitting out the userspace-accessible portions of the kernel headers into a separate directory, "kabi", "kernel-abi", "linux-abi", or a half-dozen other suggestions. Linus sprinkled a bit of holy-penguin-pee on the idea, but nothing ever really happened after that. Have you seen the linux-libc-headers: http://ep09.pld-linux.org/~mmazur/linux-libc-headers/ which, while not an official part of the kernel, do a pretty good job... Well, the eventual goal of this project would be to eliminate the need for linux-libc-headers by making that task trivial (IE: Just copy the kcore/ and kabi/ (or whatever they get called) directories into /usr/include. There would probably be some compatibility headers installed into /usr/include/linux until 2.8 is released or 2.7 is forked for some major internal modification, but other than that, the stuff shared by userspace and kernelspace would be only in kcore and kabi, and eventually the linux/* stuff could remove all the __KERNEL__ ifdefs contained therein. Right now linux-libc-headers is maintained by one person at each kernel revision. It would be much better if that maintenance load could be undertaken instead by those who create the code that uses those headers, the kernel developers themselves, because they surely understand it better and are likely to be able to do it more easily and accurately. Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 2, 2005, at 17:55:54, H. Peter Anvin wrote: UML really needs something like this, both 1 and 2. See http://groups.google.com/group/fa.linux.kernel/browse_thread/ thread/34d3c02372861a5c/71816a3c7863ea2b?lnk=st&q=%22jeff+dike% 22&rnum=27&hl=en#71816a3c7863ea2b for my take on system.h and ptrace.h when a change in the host architecture broke the UML build. UML takes most of its headers from the underlying arch. It simplifies things since most of the definitions are usable in UML. I don't have to clone and maintain my versions of all the other arch headers. OTOH, there are things in those headers which UML can't use, and these are eliminated in various ways (undefining them after the include of the host arch header, redefining them before the include). But this is a pain. It has long been my opinion that splitting headers into userspace usable and userspace unusable pieces is the right thing for UML. Less clear for the host arch. Your post seems to indicate that there is a non-UML demand for exactly this. There definitely is. The kernel needs to export its ABI in a way that userspace (UML, various libcs, etc) can import in a sane manner. In addition, the Linux kernel contains a fair bit of architecture-specific support which go well beyond what one can typically find in userspace, and it would be nice to have those. The current linux-libc-headers aren't it, because they have a fair bit of glibc-centric assumptions in those headers. That's part of why klibc doesn't use them. What I would try to do is package up as much architecture/abi knowledge in one place as possible, the former in kcore/kern-core/whatever, the latter in kabi/kern-abi/linux-abi/whatever. I would also try (as much as possible), to make everything in those directories use some kind of prefix guaranteed not to clash with other stuff, so list_add() for example would become _kcore_list_add(). The linux kernel headers in such a modified kernel would then just do this to make the kernel code happy: #ifdef __KERNEL__ # define list_add(x,y) _kcore_list_add(x,y) /**/ #endif My far-into-the-future ideal for this is to have a generic vDSO-type library that is compiled into the kernel that provides a collection of architecture-optimized routines available in both kernelspace and userspace by mapping it into each process' address space. Such a library could effectively automatically provide correct and optimized assembly routines for the currently booted CPU/arch/subarch/etc, so that userspace tools could be compiled once and run on an entire family of CPUs without modification. On the other hand, for those applications that need every last ounce of speed (Including parts of the kernel), you could pass appropriate options to the compiler to tell it to inline the assembly routines (alternative) for a single CPU make/model. Possibly some of the generic-arch stuff should be pushed back upstream to GCC, maybe have __builtin_{s,u,i,f}{8,16,32,64,128} types, etc, provided directly by GCC, so we don't have to mess with that so much. We should probably also consider the licensing of headers that are meant to be included into userspace. Userspace still includes a fair bit of GPL headers, which is technically not kosher. I think that this is mostly a nonissue. The copyright holders of the headers/inline assembly/etc should look at perhaps licensing those as LGPL or providing an exception to allow glibc, klibc, etc to link with them. On the other hand, were glibc to use the optimized routines to provide the Standard C Library, programs using said Standard C Library would not be infringing, because just like with the "userspace <=syscall=> kernelspace" boundary, that does not imply that the code is a derived work. IANAL, however, so if you know one who is willing to contribute some time, this might be an interesting issue. (Also: What procedure might be required to get some of the stuff relicensed as LGPL? How do we find all significant copyright holders/contributors from whom we need permission?) Thanks for the encouraging posts! It's good to hear that others are interested in the project, because maybe I won't need to do it _all_ myself :-D. I'll take a look at the patches mentioned, to get more of an idea on the various technical issues. Cheers, Kyle Moffett -- Simple things should be simple and complex things should be possible -- Alan Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 2, 2005, at 19:24:22, H. Peter Anvin wrote: Kyle Moffett wrote: My far-into-the-future ideal for this is to have a generic vDSO-type library that is compiled into the kernel that provides a collection of architecture-optimized routines available in both kernelspace and userspace by mapping it into each process' address space. Such a library could effectively automatically provide correct and optimized assembly routines for the currently booted CPU/arch/subarch/etc, so that userspace tools could be compiled once and run on an entire family of CPUs without modification. On the other hand, for those applications that need every last ounce of speed (Including parts of the kernel), you could pass appropriate options to the compiler to tell it to inline the assembly routines (alternative) for a single CPU make/model. I don't see why this should be compiled into the kernel. The kernel already needs those same optimized routines for its own operation (EX: all the ASM alternative() statements). Since userspace wants some of those as well, it would make sense to share them between kernel and userspace and reduce the number of libraries you would need to optimize when adding a new arch. I don't think that we should add optimized assembly for things that _aren't_ needed in the kernel, but it should share what code it does have. A side benefit of the vDSO method is that you would be able to take a standard distro install and have the kernel automatically select the correct vDSO image at runtime, simultaneously optimizing itself and chunks of userspace. Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 2, 2005, at 20:07:58, H. Peter Anvin wrote: Followup to: <[EMAIL PROTECTED]> By author:Erik Andersen <[EMAIL PROTECTED]> In newsgroup: linux.dev.kernel That would be wonderful. It would be especially nice if everything targeting user space were to use only all the nice standard ISO C99 types as defined in include/stdint.h such as uint32_t and friends... Absolutely not. This would be a POSIX namespace violation; they *must* use double-underscore types. I would actually be more inclined to provide and use types like _kabi_{s,u}{8,16,32,64}, etc. Then the glibc/klibc/etc authors would have the option of just doing "typedef _kabi_u32 uint32_t;" in their header files. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ ++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 2, 2005, at 20:34:11, H. Peter Anvin wrote: Kyle Moffett wrote: I would actually be more inclined to provide and use types like _kabi_{s,u}{8,16,32,64}, etc. Then the glibc/klibc/etc authors would have the option of just doing "typedef _kabi_u32 uint32_t;" in their header files. They have to be *double-underscore*. We have that. They're called __[su]{8,16,32,64}. I realize this completely. The point of moving to kabi/* and kcore/* would be to remove the dependence of userspace-accessible headers on kernel-internal stuff. As I see it, part of that means exporting a reasonably clean and straightforward API from kabi/kcore, including a decent namespace prefix. The goal would be something that the kernel headers could map to types useable in kernel code, that various *libc in userspace could map to POSIX types, and that would have a nice prefix to be namespace clean and avoid the risk of contamination. Given this set of goals, I think that something like the below would probably work and satisfy the needs of both *libc and the kernel: /* kcore/types.h */ typedef unsigned char __kabi_u8; typedef signed char __kabi_s8; typedef [...] /* linux/types.h */ #include #ifndef __KERNEL__ # warning "Insert some kind of deprecation warning here #endif /* These for compatibility only. When the last ABI headers move to kcore or kabi, these should go in __KERNEL__ */ typedef __kabi_u8 __u8; typedef __kabi_s8 __s8; [...] #ifdef __KERNEL__ typedef __kabi_u8 u8; typedef __kabi_s8 s8; #endif /* stdint.h */ #include typedef __kabi_u8 uint8_t; typedef __kabi_s8 int8_t; [...] Cheers, Kyle Moffett -- There is no way to make Linux robust with unreliable memory subsystems, sorry. It would be like trying to make a human more robust with an unreliable O2 supply. Memory just has to work. -- Andi Kleen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 3, 2005, at 00:28:59, Erik Andersen wrote: Absolutely not. This would be a POSIX namespace violation; they *must* use double-underscore types. I assume you are worried about the stuff under asm that ends up being included by nearly every header file in the world. Of course asm must use double-underscore types. But the thing is, the vast majority of the kernel headers live under linux/include/linux/ and do not use double-underscore types, they use kernel specific, non-underscored types such as s8, u32, etc. My copy of IEEE 1003.1 and my copy of ISO/IEC 9899:1999 both fail to prohibit using the shiny new ISO C99 type for the various #include header files, which is what I was suggesting. Anything in linux/* that is included by userspace should not presume that stdint.h has already been included or include it on its own, because the userspace program may have already made its own definitions of uint32_t, or it may not want them defined at all. The world would be so much nicer a place if user space were free to #include linux/* header files rather than keeping a per-project private copy of all kernel structs of interest. Exactly! This is why I want to create kcore/* and kabi/* that define the appropriate types, then both userspace and the kernel could use whatever types fit their fancy, defined in terms of the __kcore_ and __kabi_ types, which could be _depended_ on to exist because they are guaranteed not to conflict with other namespaces Cheers, Kyle Moffett -- I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated. -- Poul Anderson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 3, 2005, at 01:57:26, H. Peter Anvin wrote: Kyle Moffett wrote: The world would be so much nicer a place if user space were free to #include linux/* header files rather than keeping a per-project private copy of all kernel structs of interest. Exactly! This is why I want to create kcore/* and kabi/* that define the appropriate types, then both userspace and the kernel could use whatever types fit their fancy, defined in terms of the __kcore_ and __kabi_ types, which could be _depended_ on to exist because they are guaranteed not to conflict with other namespaces Agreed. We should use well-defined namespaces that won't conflict. However, I think the __[us][0-9]+ namespace can be considered well-established. True, however, IMNSHO it would be much better if the kcore/kabi stuff had a _consistent_ namespace as well. If every macro begins with "__KABI_" and every type and function with "__kabi_" (With a few function-like macro exceptions, of course), then it is trivial to see where it originally came from and provides a standard naming scheme that external parties can kind of rely upon. It also means there are fewer exceptions to remember when coding. My thought for the __[us][0-9]+ types is that they should still be defined in linux/types.h for compatibility (outside of __KERNEL__) and based off the __kabi_* types. Cheers, Kyle Moffett -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 3, 2005, at 11:36:22, Denis Vlasenko wrote: Is this an exercise in academia? Userspace app which defines uint32_t to anything different than 'typedef ' deserves the punishment, and one which does have such typedef instead of #include stdint.h will not notice. That's not the issue. Say I do this (which is perfectly valid on most platforms): typedef unsigned int uint32_t; #include What exactly should happen? If linux/loop.h includes stdint.h to get uint32_t, then I'll get duplicate definition errors. If it omits stdint.h, then uint16_t won't be defined (because the userspace app doesn't think that it needs it) and I'll get undefined type errors. Either way, depending on the existence or nonexistence of the POSIX types in userspace-accessible kernel headers is not viable. All these u32, uint32_t, __u32 end up typedef-ing to same integer type anyway... The point is to provide a type that _isn't_ in some standard so that _we_ can define its inclusion rules. If the standards had gone and defined "Userspace must include stdint.h or define _all_ types appropriately", then we would not have had this issue, but many apps in userspace would cease to compile on standards compliant platforms. Cheers, Kyle Moffett -- Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shultz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 3, 2005, at 11:19:17, H. Peter Anvin wrote: Thus, an ABIzed or whatever it's called might export "struct __kabi_stat" and "struct __kabi_stat64" with the expectation that the caller would "#define __kabi_stat64 stat" if that is the version they want. A typedef isn't good enough for C, since you can't typedef struct tags. Didn't you mean "#define stat __kabi_stat64"? Also, I can see that would pose other issues as well say my app does "struct stat stat;" Any error messages would refer to a variable "__kabi_stat64" instead of the expected "stat": A userspace program: struct stat stat; stat.invalid = 1; Preprocesses into: struct __kabi_stat64 __kabi_stat64; __kabi_stat64.invalid = 1; And gives an error something like this for that line, confusing the programmer: Invalid member "invalid" for "__kabi_stat64" As far as I can tell, this is not a solvable issue unless GCC can come up with a way to either: typedef struct foo struct bar; or struct bar { unnamed struct foo; }; the former being much nicer. On the other hand, I think the following should work, because the st_* names are within the C namespace and should be much easier to redefine, although misuse of one of those names might be a bit more catastrophic for the user app. struct stat { struct __kabi_stat64 __stat64; }; #define st_dev __stat64.st_dev #define st_ino __stat64.st_ino [...] Then the userspace program could do this: struct stat foo; foo.st_ino = 0; And it would be preprocessed into: struct stat foo;foo.__stat64.st_ino = 0; Cheers, Kyle Moffett -- Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shultz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: i386: kill !4KSTACKS
On Sep 4, 2005, at 23:41:58, Alex Davis wrote: --- Sean <[EMAIL PROTECTED]> wrote: It's not a philosophical issue, it's what Linux _is_: an open source operating system! That's what the developers are working on; not your half-baked vision. Um, ever hear of 'compromise'?? All I'm saying is let people use what currently works until we can get an open-source solution. Ndiswrapper's existence is not stopping you (or anyone else) from pestering manufacturers for spec's and writing drivers. I look at ndiswrapper as a stop-gap solution. Hey, even Linus himself has said 'better a sub-optimal solution than no solution'. In any case, this discussion is moot because the kernel API is changing for the better and there is a clearly defined fix for ndiswrapper that will allow it to continue to work even with the new interface: allocate a separate ndiswrapper stack (IE: Not the kernel stacks). The kernel is under no obligation not to break out-of-tree drivers, etc, even semi- non- -binary-only ones such as ndiswrapper. Figure out how to fix it and move on! Cheers, Kyle Moffett -- Q: Why do programmers confuse Halloween and Christmas? A: Because OCT 31 == DEC 25. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: i386: kill !4KSTACKS
On Sep 5, 2005, at 18:32:32, Thorild Selen wrote: Adrian Bunk <[EMAIL PROTECTED]> writes: Please name situations where 8K stacks may be preferred that do not involve binary-only modules. How about NFS-exporting a filesystem on LVM atop md? I believe it has been mentioned before in discussions that 8k stacks are strongly recommended in this case. Are those issues solved? I think the worst overflow case anyone found was nfs=>xfs=>lvm=>dm=>scsi, if someone has such a configuration, please retest with current -mm or similar. I think there are several patches in there to resolve the excessive stack usage and a few to do some sort of bio chaining (Instead of recursive calls). I don't remember what underlying hardware was behind the SCSI, but I suspect something like iSCSI or USB would push some extra stack in there for stress testing. Cheers, Kyle Moffett -- I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated. -- Poul Anderson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Splitting out kernel<=>userspace ABI headers
On Sep 5, 2005, at 12:35:42, H. Peter Anvin wrote: Followup to: <[EMAIL PROTECTED]> By author: Kyle Moffett <[EMAIL PROTECTED]> In newsgroup: linux.dev.kernel Didn't you mean "#define stat __kabi_stat64"? Also, I can see that would pose other issues as well say my app does "struct stat stat;" Any error messages would refer to a variable "__kabi_stat64" instead of the expected "stat": No, I didn't. That's *exactly* why I didn't mean that. #define __kabi_stat64 stat #include That being said, I would personally like to see it possible to typedef struct, union and enum tags. _OH_!!! Forgive me for missing the point entirely! I can see how that would work very well. Nice trick, BTW! Very sneaky, needs significant explanatory comments in whatever header file it ends up in lest others get confused in the same fashion as I. With all of that mess out of the way, I'll work on getting a few initial RFC patches out the door, and then we can revisit this discussion once there is something tangible to talk about. Cheers, Kyle Moffett -- Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shultz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][MEGAPATCH] Change __ASSEMBLY__ to __ASSEMBLER__ (defined by GCC from 2.95 to current CVS)
On Sep 5, 2005, at 19:28:07, Kyle Moffett wrote: With all of that mess out of the way, I'll work on getting a few initial RFC patches out the door, and then we can revisit this discussion once there is something tangible to talk about. Ugh. Step one for my cleanup is to rename __ASSEMBLY__ to something defined automatically by GCC (IE: __ASSEMBLER__). And yes, I checked, __ASSEMBLER__ is defined by everything from old 2.95 to 4.0, even though it wasn't really documented in anything older than 3.4. This megapatch is basically a search and replace of __ASSEMBLY__ with __ASSEMBLER__ over the whole kernel source, except in Makefiles, where I just delete the -D__ASSEMBLY__ argument. If this is generally acceptable, I'll break it up into small digestible pieces and send to individual maintainers, unless someone wants to pass the whole monster through their tree in one big lump. This is a lot of code churn, but it's a valid cleanup and will help me out as I try to make more of the kernel headers easily digestible for userspace. Ok, the patch itself is temporarily located here (Please be nice to my desktop, it has a 650MB/day upload limit imposed by Virginia Tech that I'd rather not go over) [patch is 308k]: http://zeus.moffetthome.net/~kyle/rename-__ASSEMBLY__-to- __ASSEMBLER__.patch And here's the diffstat [27k] http://zeus.moffetthome.net/~kyle/rename-__ASSEMBLY__-to- __ASSEMBLER__.diffstat Cheers, Kyle Moffett -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Modifying Cryptography code
On Sep 6, 2005, at 08:38:48, Alaa Dalghan wrote: What I am looking for is the portion of the C code in the kernel where the Decryption function is called to decrypt a received packet. When I find this statement, maybe i can make it conditionnal such as: If the destination is me then Decrypt else DO NOT! You can't make this work. First of all, the other WinXP clients would be completely unable to decrypt your packets, because they don't have the right key. Secondly, the kernel cannot know what the destination is until *after* it has decrypted the packet, because the real target address is encrypted along with the rest of the data for security. If your OpenSwan box is too slow, get a faster OpenSwan box, don't try to break the encryption to make it faster. You cannot remove enough encryption features to get the required extra speed without disabling the encryption entirely. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ ++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ham] Re: Gracefully killing kswapd, or any kernel thread
On Sep 7, 2005, at 17:07:12, Kristis Makris wrote: To kill a kernel thread, you need to make __it__ call exit(). It must be There must be another way to do it. Perhaps one could have another process effectively issue the contents of do_exit for the kswapd task_struct ? Umm, so then the kernel does what, exactly? You have a process in some indeterminate state, possibly holding semaphores, definitely pinning memory/resources/etc, and you just stop it, turn it off, and expect things to continue working? This is similar in nature to that thread a while ago about kernel error recovery and killing uninterruptible user processes. To extend this to kernel threads, unless the kernel thread has been _specifically_ coded to be interruptible, it isn't, and furthermore, *can't* be. CODED to do that! You can't do it externally although you can send I'm clearly asking for the case where the thread wasn't coded to do that. You can't. This is flatly impossible. Go see the thread a while back about a hot-patch system call for several reasons why that is a bad idea. In particular, look at the post that discusses phone switches, the one with the quote "'So why don't you just reboot the affected switches?' [...] 'That assumes the switches had ever been booted in the first place'". it a signal, after which it will spin forever kflushd and keventd don't seem to spin forever. I still haven't determined what makes kswapd spin forever after it receives the signal. Probably a while(1) loop that isn't intended to stop until the machine physically powers off. If you want to patch one specific kernel thread, you might be able to do that, but you can't just expect to hot-patch random parts of the kernel at runtime and have things work. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+ ++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: freeze vs freezer
On Jan 04, 2008, at 15:54:06, Oliver Neukum wrote: Am Donnerstag, 3. Januar 2008 23:06:07 schrieb Nigel Cunningham: Hi. a) mount fuse on /tmp/first b) mount fuse on /tmp/second Then the server task for (a) does "ls /tmp/second". So it will be frozen, right? How do you then freeze (a)? And keep in mind that the server task may have forked. I guess I should first ask, is this a real life problem or a hypothetical twisted web? I don't see why you would want to make two filesystems interdependent - it sounds like the way to create livelock and deadlocks in normal use, before we even begin to think about hibernating. Good questions. I personally don't use fuse, but I do care about power management. The problem I see is that an unprivileged user could make that dependency, even inadvertedly. I don't think it makes sense for the kernel to try to keep track of hard data dependencies for FUSE filesystems, or to even *attempt* to auto-suspend them. You should instead allow a privileged program to initiate a "freeze-and-flush" operation on a particular FUSE filesystem and optionally wait for it to finish. Then your userspace would be configured with the appropriate data dependencies and would stop FUSE filesystems in the appropriate order. In addition, the kernel would automatically understand ext3=>loopback=>fuse, and when asked to freeze the "fuse" part, it would first freeze the "ext3" and the "loopback" parts using similar mechanisms as device-mapper currently uses when you do "dmsetup suspend mydev" followed by "echo 0 $SIZE snapshot /dev/mapper/mydev- base /dev/mapper/mydev-snap-back p 8 | dmsetup load mydev" (IE: when you create a snapshot of a given device). Naturally userspace could deadlock itself (although not the kernel) by freezing a block device and then attempting to access it, but since the "freeze" operation is limited to root this is not a big issue. The way to freeze all filesystems safely would be to clone a new mount namespace, mlockall(), mount a tmpfs, pivot_root() into the tmpfs, bind-mount the filesystems you want to freeze directly onto subdirectories of the tmpfs, and then freeze them in an appropriate order. Besides which the worst-case is a pretty straightforward non-critical failure; you might fail to fully sync a FUSE filesystem because its daemon is asleep waiting on something (possibly even just sitting in a "sleep(1)" call with all signals masked). You simply need to make sure that all tasks are asleep outside of driver critical sections so that you can properly suspend your device tree. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: The ext3 way of journalling
On Jan 08, 2008, at 15:51:53, Andi Kleen wrote: Theodore Tso <[EMAIL PROTECTED]> writes: Now, there are good reasons for doing periodic checks every N mounts and after M months. And it has to do with PC class hardware. (Ted's aphorism: "PC class hardware is cr*p"). If these reasons are good ones (some skepticism here) then the correct way to really handle this would be to do regular background scrubbing during runtime; ideally with metadata checksums so that you can actually detect all corruption. Poor man's background scrubbing: (A) Use LVM like virtually all modern distros offer (B) Leave some extra space in your LVM volume group (enough for 1 snapshot over the time it takes to do an FSCK). (C) Periodically run the following scriptlet: set -e START="$(date +'%Y%m%d%H%M%S')" lvcreate -s -n "${VOLUME}-snap" "${VG}/${VOLUME}" if nice +20 fsck -fy "/dev/mapper/${VG}_${VOLUME}-snap"; then echo 'Background scrubbing succeeded!' tune2fs -T "${START}" "/dev/mapper/${VG}_${VOLUME}" else echo 'Background scrubbing failed! Reboot to fsck soon!' tune2fs -C 16383 -T "19000101" "/dev/mapper/${VG}_${VOLUME}" fi lvremove "${VG}/${VOLUME}-snap" Basically you can fsck the offline snapshot in the background. If it succeeds you can adjust the "last checked" date to the time when the snapshot was taken and if it fails you can schedule an FSCK at next reboot (and possibly remount the filesystem read-only or reboot immediately). You can do the same thing for your /boot volume, although you probably have to manually use dmsetup since most bootloaders can't interpret LVM volumes. I've always been surprised that distros like RedHat which automatically use LVM don't stuff this in their weekly or monthly checks on desktop systems. User experience could also be dramatically improved with automated smartd configuration and user- interactive logging and warning messages. But since fsck is so slow and disks are so big this whole thing is a ticking time bomb now. e.g. it is not uncommon to require tens of minutes or even hours of fsck time and some server that reboots only every few months will eat that when it happens to reboot. This means you get a quite long downtime. My servers all have an "interval-between-checks" of 2-6 weeks and are configured to run nice +20 background "fsck" checks during off-hours between once every few days and once every few weeks. I also have the "max mount count" numbers set to primes between 7 and 37 (depending on the filesystem) so that troubled or frequently-rebooted systems are more frequently verified. The end result is that I almost never have the dreaded 4-hour-fsck-on-boot problem. A drive has certainly been fscked within the last few weeks of operation, and I will only ever have multiple large filesystems all fscked at the same time very rarely (gcd of their max-mount-counts). Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: yield API
On Dec 12, 2007, at 17:39:15, Jesper Juhl wrote: On 02/10/2007, Ingo Molnar <[EMAIL PROTECTED]> wrote: sched_yield() has been around for a decade (about three times longer than futexes were around), so if it's useful, it sure should have grown some 'crown jewel' app that uses it and shows off its advantages, compared to other locking approaches, right? I have one example of sched_yield() use in a real app. Unfortunately it's proprietary so I can't show you the source, but I can tell you how it's used. The case is this: Process A forks process B. Process B does some work that takes aproximately between 50 and 1000ms to complete (varies), then it creates a file and continues to do other work. Process A needs to wait for the file B creates before it can continue. Process A *could* immediately go into some kind of "check for file; sleep n ms" loop, but instead it starts off by calling sched_yield() to give process B a chance to run and hopefully get to the point where it has created the file before process A is again scheduled and starts to look for it - after the single sched yield call, process A does indeed go into a "check for file; sleep 250ms;" loop, but most of the time the initial sched_yield() call actually results in the file being present without having to loop like that. That is a *terrible* disgusting way to use yield. Better options: (1) inotify/dnotify (2) create a "foo.lock" file and put the mutex in that (3) just start with the check-file-and-sleep loop. Now is this the best way to handle this situation? No. Does it work better than just doing the wait loop from the start? Yes. It works better than doing the wait-loop from the start? What evidence do you provide to support this assertion? Specifically, in the first case you tell the kernel "I'm waiting for something but I don't know what it is or how long it will take"; while in the second case you tell the kernel "I'm waiting for something that will take exactly X milliseconds, even though I don't know what it is. If you really want something similar to the old behavior then just replace the "sched_yield()" call with a proper sleep for the estimated time it will take the program to create the file. Is this a good way to use sched_yield()? Maybe, maybe not. But it *is* an actual use of the API in a real app. We weren't looking for "actual uses", especially not in binary-only apps. What we are looking for is optimal uses of sched_yield(); ones where that is the best alternative. This... certainly isn't. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/9] bfs: move function prototype to the proper header file
On Jan 24, 2008, at 18:13, Dmitri Vorobiev wrote: Heikki Orsila пишет: On Fri, Jan 25, 2008 at 01:32:04AM +0300, Dmitri Vorobiev wrote: +/* inode.c */ +extern void dump_imap(const char *, struct super_block *); + Functions should not be externed, remove extern keyword. Care to explain why? Following is an explanation why the contrary is probably true: 1) We have lots of precedents in existing code: [EMAIL PROTECTED]:~/Projects/misc/linux$ git-grep 'extern void' include | wc -l 5523 [EMAIL PROTECTED]:~/Projects/misc/linux$ The "extern" keyword on functions is *completely* redundant. For C variables: Declaration: extern int foo; Definition: int foo; File-scoped: static int foo; For C functions: Declaration: void foo(int x); Definition: void foo(int x) { /*...body...*/ } File-scoped: static void foo(int x) { /*...body...*/ } The compiler will *allow* you to use "extern" on the function prototype, but the presence or absence of a function body is sufficiently obvious for it to determine whether the prototype is a declaration or a definition that the "extern" keyword is not required and therefore redundant. For maximum readability and cleanliness I recommend that you leave off the "extern" on the function declarations; it makes the lines much longer without obvious gain. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
g there. Perhaps next time I'm bored. I think a fair amount of what we need is already done in SELinux, and efforts would be better spent in figuring out what seems too complicated in SELinux and making it simpler. Probably a fair amount of that just means better tools. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
On Oct 05, 2007, at 00:45:17, Eric W. Biederman wrote: Kyle Moffett <[EMAIL PROTECTED]> writes: On Oct 04, 2007, at 21:44:02, Eric W. Biederman wrote: SElinux is not all encompassing or it is generally incomprehensible I don't know which. Or someone long ago would have said a better way to implement containers was with a selinux ruleset, here is a selinux ruleset that does that. Although it is completely possible to implement all of the isolation with the existing LSM hooks as Serge showed. The difference between SELinux and containers is that SELinux (and LSM as a whole) returns -EPERM to operations outside the scope of the subject, whereas containers return -ENOENT (because it's not even in the same namespace). Yes. However if you look at what the first implementations were. Especially something like linux-vserver. All they provided was isolation. So perhaps you would not see every process ps but they all had unique pid values. I'm pretty certain Serge at least prototyped a simplified version of that using the LSM hooks. Is there something I'm not remember in those hooks that allows hiding of information like processes? Yes. Currently with containers we are taking that one step farther as that solves a wider set of problems. IMHO, containers have a subtly different purpose from LSM even though both are about information hiding. Basically a container is information hiding primarily for administrative reasons; either as a convenience to help prevent errors or as a way of describing administrative boundaries. For example, even in an environment where all sysadmins are trusted employees, a few head-honcho sysadmins would get root container access, and all others would get access to specific containers as a way of preventing "oops" errors. Basically a container is about "full access inside this box and no access outside". By contrast, LSM is more strictly about providing *limited* access to resources. For an accounting business all client records would grouped and associated together, however those which have passed this year's review are read-only except by specific staff and others may have information restricted to some subset of the employees. So containers are exclusive subsets of "the system" while LSM should be about non-exclusive information restriction. We also have in the kernel another parallel security mechanism (for what is generally a different class of operations) that has been quite successful, and different groups get along quite well, and ordinary mortals can understand it. The linux firewalling code. Well, I wouldn't go so far as the "ordinary mortals can understand it" part; it's still pretty high on the obtuse-o-meter. True. Probably a more accurate statement is:`unix command line power users can and do handle it after reading the docs. That's not quite ordinary mortals but it feels like it some days. It might all be perception... I have seen more *wrong* iptables firewalls than I've seen correct ones. Securing TCP/IP traffic properly requires either a lot of training/experience or a good out-of-the-box system like Shorewall which structures the necessary restrictions for you based on an abstract description of the desired functionality. For instance what percentage of admins do you think could correctly set up their netfilter firewalls to log christmas-tree packets, smurfs, etc without the help of some external tool? Hell, I don't trust myself to reliably do it without a lot of reading of docs and testing, and I've been doing netfilter firewalls for a while. The bottom line is that with iptables it is *CRITICAL* to have a good set of interface tools to take the users' "My system is set up like..." description in some form and turn it into the necessary set of efficient security rules. The *exact* same issue applies to SELinux, with 2 major additional problems: 1) Half the tools are still somewhat beta-ish and under heavy development. Furthermore the semi-official reference policy is nowhere near comprehensive and pretty ugly to read (go back to the point about the tools being beta-ish). 2) If you break your system description or translation tools then instead of just your network dying your entire *system* dies. The linux firewalling codes has hooks all throughout the networking stack, just like the LSM has hooks all throughout the rest of linux kernel. There is a difference however. The linux firewalling code in addition to hooks has tables behind those hooks that it consults. There is generic code to walk those tables and consult with different kernel modules to decide if we should drop a packet. Each of those kernel modules provides a different capability that can be used to genera
Re: [PATCH] Replace __attribute_pure__ with __pure
Trimmed the CC list a bit On Oct 05, 2007, at 20:51:21, H. Peter Anvin wrote: Ralf Baechle wrote: To be consistent with the use of attributes in the rest of the kernel replace all use of __attribute_pure__ with __pure and delete the definition of __attribute_pure__. Concern: __attribute_pure__ is very similar to __attribute_const__, which is almost completely, but not totally unlike the keyword "const"... Yes, there's also the fact that __pure is a reserved GCC keyword. Essentially according to GCC docs all of the GCC-specific keywords are equivalently defined as "keyword", "__keyword", and "__keyword__", with only the latter two defined in strict-ANSI mode. The following is valid according to GCC docs: static int __attribute__((__pure)) my_strlen(const char *str); With the proposed definition of __pure, that becomes a noticeably invalid __attribute__((__attribute__((__pure__ Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: idio{,ma}tic typos (was Re: + fix-vm_can_nonlinear-check-in-sys_remap_file_pages.patch added to -mm tree)
On Oct 11, 2007, at 03:35:37, Alexey Dobriyan wrote: Sadly, yes. [PATCH] smctr: fix "|| 0x" typo IBM_PASS_SOURCE_ADDR is 1, so logically ORing it with status bits is pretty useless. Do bitwise OR, instead. Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> --- drivers/net/tokenring/smctr.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/net/tokenring/smctr.c +++ b/drivers/net/tokenring/smctr.c @@ -3413,7 +3413,7 @@ static int smctr_make_tx_status_code(struct net_device *dev, tsv->svi = TRANSMIT_STATUS_CODE; tsv->svl = S_TRANSMIT_STATUS_CODE; -tsv->svv[0] = ((tx_fstatus & 0x0100 >> 6) || IBM_PASS_SOURCE_ADDR); +tsv->svv[0] = ((tx_fstatus & 0x0100 >> 6) | IBM_PASS_SOURCE_ADDR); /* Stripped frame status of Transmitted Frame */ tsv->svv[1] = tx_fstatus & 0xff; Hmm, here's a question for you: The old code was equivalent to "tsv- >svv[0] = 1;", what's your proof that we don't rely on this "bug" elsewhere in the code? In other words, this is a significant behavior change (albeit fixing an apparent bug) from what we've done for a while. You might want to do a git-blame on this bit of code to see who the last person to modify it was and ask them to test or confirm the patch first. The same general questions apply to the other logical-op bugs. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "mount --bind" with user/group/mode definition?
On Oct 11, 2007, at 04:35:37, Ph. Marek wrote: is there some way to duplicate a directory somewhere else (like with "mount --bind"), but having different owner/group/mode bits? I'd like to mount a directory I have no control over (think NFS, or floppy, ...) with clearly defined rights - like root:, mode 0550 for all directories, and 0440 for all files. (Here I want to have full *read* control, regardless of the original permissions). [ I know that this special case can be (mostly) done by a read-only binding mount; the part that is missing is eg. files with a different owner being 0700. ] I know that something like this is possible for eg. VFAT, which has no right descriptors for itself; but I'd need that for arbitrary directory trees, who themselves *have* permissions set. Is there some way to achieve that? Not at the moment, unfortunately. I suspect that with the recent developments in user container support and/or overlay mounting it will become possible to either write a UID/GID-translation overlay filesystem or grant cross-UID-container keys to achieve what you want. On the other hand that probably won't fully happen for up to a year or so. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
Ok, finally getting some time to work on this stuff once again (life gets really crazy sometimes). I would like to postulate that you can restate any SMACK policy as a functionally equivalent SELinux policy (with a few slight technical differences, see below). I've been working on a script to do this but keep getting stuck tracking down minor bugs and then get dragged off on other things I need to do. Here is the method I am presently trying to implement: First divide the SELinux access vectors into 7 groups based on which ones SMACK wishes to influence: (R) Requires "read" permissions (the 'r' bit) (W) Requires "write" permissions (the 'w' bit) (X) Requires "execute" permissions (the 'x' bit) (A) Requires "append" OR "write" permissions (the 'a' bit) (P) Requires CAP_MAC_OVERRIDE (K) May not be performed by a non-CAP_MAC_OVERRIDE process on a CAP_MAC_OVERRIDE process (N) Does not require any special permissions The letters in front indicate the names I will use in the rest of this document to describe the sets of access vectors. Next define a single SELinux user "smack", and two independent roles, "priv" and "unpriv". We create the set of SMACK equivalence-classes defined as various SELinux types with substitutions for "*", "^", "_", and "?", and then completely omit the MLS portions of the SELinux policy. The next step is to establish the fundamental constraints of the policy. To prevent processes from gaining CAP_MAC_OVERRIDE we iterate over the access vectors in (K) and add the following constraint for each vector: constrain $OBJECT_CLASS $ACCESS_VECTOR ((r1 == r2) || (r1 == priv)) This also includes: constrain process transition ((r1 == r2) || (r1 == priv)) Then we require privilege to access the (P) vectors; for each vector in (P) we add a constraint: constrain $OBJECT_CLASS $ACCESS_VECTOR (r1 == priv) At this point the only rules left to add are the between-type rules. Here it gets mildly complicated because SMACK is a linear-lookup system (each rule must be matched in order) whereas SELinux is a globally-unique-lookup system (all rules are mutually exclusive and matched simultaneously). Essentially for each SMACK rule: $SOURCE $DEST $PERM_BITS We iterate over all of the classes represented in the access vector lists in $PERM_BITS and create rules for each one: allow { $SOURCE } { $DEST }:$PERM_CLASS { $PERM_VECTORS }; If you need SMACK to allow subtractive permissions then you need to expand that further, however I believe as an initial cut that it sufficient. The only other task is to prepend the auto-generated object-class and access-vector lists to the policy and append the initial SIDs that smack wants various objects to have, as well as allowing the "smack" user the "priv" and "nopriv" roles and allowing those two roles entry into all of the SMACK types. The resulting SELinux-ified SMACK labels would go from: SomeLabel (with CAP_MAC_OVERRIDE) AnotherLabel YetAnotherLabel to: smack:priv:SomeLabel smack:nopriv:AnotherLabel smack:nopriv:YetAnotherLabel Casey, hopefully this gives you some ideas about how I think you could modify the SELinux code to compile out the "user" field and simplify the "role" field as needed. I'm still not seeing anything which SELinux cannot directly implement without additional code, even the "CAP_MAC_OVERRIDE" bit. If the semantics don't seem quite right, please provide details about how you think the models differ and I will try to address the concerns. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
On Oct 11, 2007, at 11:41:34, Casey Schaufler wrote: --- Kyle Moffett <[EMAIL PROTECTED]> wrote: [snipped] I'm still waiting to see the proposed SELinux policy that does what Smack does. That *is* the SELinux policy which does what Smack does. I keep having bugs in the perl-script I'm writing on account of not having the time to really get around to fixing it, but that is exactly the procedure for generating an SELinux policy from a SMACK policy. I can accept that you don't see anything that can't be implemented thus, but that's not the point. You've provided some really clear design notes, and that's great, but it ain't the code. You said that you could write a 500 line perl script that would do the whole thing, and that left some people with an impression that Smack is a subset of SELinux. Well, I'm already finding myself digging out from under that missunderstanding, and with people who are assuming that your policy has been done, "proving" the point. I'd love to have time to finish the script but unfortunately real life keeps interfering and I'm going to have to go back to lurking on this thread. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Reserve N process to root
Please don't trim CC lists On Oct 11, 2007, at 17:02:37, Al Boldi wrote: David Newall wrote: [EMAIL PROTECTED] wrote: What David meant was that "root will always have a slot" doesn't *actually* help unless you *also* have a way to actually *spawn* such a process. In order to do the ps, kill, and so on that you need to recover, you need to already have either a root shell available, or a way to *get* a root shell that doesn't rely on a non-root process (so /bin/su doesn't help here). That's right, although it's worse than that. You need to have a process with CAP_SYS_ADMIN. If root processes normally have that capability then the reserved slots may well disappear before you notice a problem. If root processes normally don't have it, then you need to guarantee that one is already running. I once posted a patch to handle this DoS, but, as usual, it wasn't accepted. Go figure... This isn't really necessary any more with the new CFS scheduler. If you want to prevent excess memory usage then you limit memory usage, not process count, so just set the system max process count to something absurdly high and leave the user counts down at the maximum a user might run. Then as long as the sum of the user processes is less than the max number of processes (which you just set absurdly high or unlimited), you may still log in. With the per-user scheduling enabled CFS allows you to run an optimistically-real-time game as one user and several thousand busy-loops as another user and get almost picture perfect 50% CPU distribution between the users. To me that seems a much better DoS-prevention system than limits which don't scale based on how many people are requesting resources. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Reserve N process to root
On Oct 12, 2007, at 01:37:23, Al Boldi wrote: Kyle Moffett wrote: This isn't really necessary any more with the new CFS scheduler. If you want to prevent excess memory usage then you limit memory usage, not process count, so just set the system max process count to something absurdly high and leave the user counts down at the maximum a user might run. Then as long as the sum of the user processes is less than the max number of processes (which you just set absurdly high or unlimited), you may still log in. With the per-user scheduling enabled CFS allows you to run an optimistically-real-time game as one user and several thousand busy-loops as another user and get almost picture perfect 50% CPU distribution between the users. To me that seems a much better DoS- prevention system than limits which don't scale based on how many people are requesting resources. You have a point, and resource-controllers can probably control DoS a lot better, but the they also incur more overhead. Think of this "lockout prevention" patch as a near zero overhead safety valve. But why do you need to add "lockout prevention" if it already exists? With CFS' extremely efficient per-user-scheduling (hopefully soon to be the default) there are only two forms of lockout by non- root processes: (1) Running out of PIDs in the box's PID-space (think tens or hundreds of thousands of processes), or (2) Swap- storming the box to death. To put it bluntly trying to reserve free PID slots is attacking the wrong end of the problem and your so called "lockout prevention" could very easily ensure that 10 PIDs are available even if the user has swapstormed the box with the PIDs he does have. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Get physical MAC address
On Jan 01, 2008, at 21:42:18, Jon Masters wrote: On Mon, 2007-12-31 at 12:39 +0700, Theewara Vorakosit wrote: I get MAC address from ioctl. However, ifconfig can change this MAC address. Can I get a real physical MAC address of the NIC? Forgive me reading into your mail...this smells a bit like some kind of licensing/compliance thing. Just bear in mind that using the MAC to verify the identity of a machine is utterly useless and pointless - anyone can trivially fool your software[0] to see what it "wants". Not necessarily; I can easily see distros wanting to have a "Restore defaults" button in their network config windows which also includes restoring the default MAC address to the NIC. It should also be pointed out that anybody with one of a selection of re-flashable NICS (or NICS with removable EEPROMS) can easily change the MAC address on their NIC. Other alternatives includes renaming eth0 to mynet0 and creating a downed dummy interface called "eth0" with the desired MAC addr. [0] We used to have to do far worse kludgery in college, in order to prevent the silly powers that be who "banned" network cards other than those made by one manufacturer from being used on their little network. Well for basically any userspace-level check, all it takes is somebody who knows ASM and has about 5 minutes to track down the problematic branch instructions. Then they just have to write a 10- line GDB script which starts the program, traps the appropriate instructions, and then changes a "0" to a "1" (or vice versa) before the conditional branch. On Windows it's vaguely practical (albeit crash-prone) to load a kernel hack which prevents your program from being debugged, but under Linux it's effectively impossible Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: git guidance
On Nov 29, 2007, at 00:27:04, Al Boldi wrote: Jakub Narebski wrote: Besides, you can always use "git show :". For example gitweb (and I think other web interfaces) can show any version of a file or a directory, accessing only repository. Sure, browsing is the easy part, but Version Control starts when things become writable. But... git history is very inherently completely immutable once created... that's the only way you can index everything with a simple SHA-1. If you want to write to the "git filesystem" by adding new commits then you need to use the appropriate commands, same as every other VCS on the planet. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel Development & Objective-C
On Nov 30, 2007, at 13:40:07, H. Peter Anvin wrote: Kyle Moffett wrote: With that said, there is a significant performance penalty as all Objective-C method calls are looked up symbolically at runtime for every single call. GACK! At least C++ has vtables. In a tight loop there is a way to do a single symbolic lookup and just call directly through a function pointer, but typically it isn't necessary for GUI programs and the like. The flexibility of being able to dynamically add new methods to an existing class (at least for desktop user interfaces) significantly outweighs the performance cost. Any performance-sensitive code is typically written in straight C anyways. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel Development & Objective-C
On Nov 30, 2007, at 09:34:45, Lennart Sorensen wrote: On Thu, Nov 29, 2007 at 12:14:16PM +, Ben Crowhurst wrote: Has Objective-C ever been considered for kernel development? Doesn't objective C essentially require a runtime to provide a lot of the features of the language? If it does (as I suspect) then it is totally unsiatable for kernel development. That and object oriented languages in general are badly designed and a bad idea. Having not used objective C I have no idea if it qualifies as badly designed or not. Certainly C++ and java are both very badly designed. Objective-C is actually a pretty minimal wrapper around C; it was originally implemented as a C preprocessor. It generally does not have any kind of memory management, garbage collection, or anything else (although typically a "runtime" will provide those features). There are no first-class exceptions, so there would be nothing to worry about there (the exceptions used in GUI programs are built around the setjmp/longjmp primitives). Objective-C is also almost completely backwards-compatible with C, much more so than C++ ever was. As far as the runtime goes the kernel would be expected to write its own, the same way that it implements "kmalloc()" as part of a "C runtime". Since the runtime itself never does any implicit memory allocation, I think it would conceivably even be relatively safe for kernel usage. With that said, there is a significant performance penalty as all Objective-C method calls are looked up symbolically at runtime for every single call. For GUI programs where large chunks of the code are event-loops and not performance-sensitive that provides a huge amount of extra flexibility. In the kernel though, there are many codepaths where *every* *single* instruction counts; that could be a serious performance hit. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relax permissions for reading hard drive serial number?
On Dec 02, 2007, at 13:45:44, Matti Aarnio wrote: This lack of having stable(*) unique system identifier available to applications is one of the small details that make node locked commercial software delivery challenging thing in UNIX environments.. *) "stable" as both stable data, and stable API to get it. Well... There's that. There's also the fact that anybody with a modicum of ASM programming skills can get clever with GDB and traces from "Correct HW serial" and "Incorrect HW serial" can write a 10- line GDB script to make it work regardless. I did something similar with a popular FPS (which I legitimately own) on one of my Mac systems after having left the DVD behind when going to a LAN party. Addresses removed to protect the innocent^Wguilty, but they took maybe 15 minutes to acquire: break *END_OF_CDKEY_CODE_DECRYPTION run delete 1 advance *JUST_AFTER_CDKEY_CHECK set $r3 = 0 detach At some point every such "locked" computer program has code like this: if (program_is_not_authorized()) { display_nasty_dialog(); exit(1); } All it takes for somebody with a debugger is to identify the last instruction of the "program_is_authorized()" function and change $r3 (or whatever return register your system uses) from a 1 to a 0. The fact remains that once the software is running on *THEIR* computer there is nothing you can practically do to forcibly prevent them from using it in whatever fashion they desire. Typically if you price your software reasonably people will be willing to pay for multiple copies but there are no foolproof technical measures to enforce that they do so. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Reduce stack used by lib/hexdump.c
On Dec 05, 2007, at 21:42:35, Joe Perches wrote: On Wed, 2007-12-05 at 18:18 -0800, Randy Dunlap wrote: Joe Perches wrote: Maybe just eliminate the 16 or 32 byte width option and force it to only 16 byte widths. Have you checked users (callers)? I'm pretty sure that one of the callers wanted 32 and that's why it's there. I did. There is only 1 subsystem. That's easy to change. drivers/mtd/ubi/debug.c: print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 32, 1, drivers/mtd/ubi/io.c: print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 32, 1, Long lines in the log file are not too easy to read anyway. Using 16 byte dumps per line instead of 32 isn't painful. It gets rid of the allocation, reduces the argument count and makes the kernel smaller. I think it's all good. Every current caller would have to change though. Alternatively, since print_hex_dump is not a performance-critical path (and usually indicates an error/debug condition), you could probably just make a static "hexdump_lock" spinlock and spin_lock_irqsave()/spin_unlock_irqrestore(). It would always nest inside any other lock (except during crash, where we break locks already for printk()), and I doubt any of the callers would notice the serialization since they're already serialized on the printk buffer. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New Address Family: Inter Process Networking (IPN)
On Dec 06, 2007, at 00:30:16, Renzo Davoli wrote: AF_IPN is different. AF_IPN is the broadcast and peer-to-peer extension of AF_UNIX. It supports communication among *user* processes. Ok, you say it's different, but then you describe how IP unicast and broadcast work. Both are frequently used for communication among "*user* processes". Please provide significantly more details about exactly *how* it's different. Example: Qemu, User-Mode Linux, Kvm, our umview machines can use IPN as an Ethernet Hub and communicate among themselves with the hosting computer and the world by a tap like interface. You say "tap like" interface, but people do this already with existing infrastructure. You can connect Qemu, UML, and KVM to a standard linus "tap" interface, and then use the standard Linux bridging code to connect the "tap" interface to your existing network interfaces. Alternatively you could use the standard and well-tested IP routing/firewalling/NAT code to move your packets around. None of this requires new network infrastructure in the slightest. If you have problems with the existing code, please improve it instead of creating a slightly incompatible replacement which has different bugs and workarounds. You can also grab an interface (say eth1) and use eth0 for your hosting computer and eth1 for the IPN network of virtual machines. You can do that already with the bridging code. If you load the kvde_switch submodule IPN can be a virtual Ethernet switch. As I described above, this can be done with the existing bridging and tun/tap code. Another Example: You have a continuous stream of data packets generated by a process, and you want to send this data to many processes. Maybe the set of processes is not known in advance, you want to send the data to any interested process. Some kind of publish&subscribe communication service (among unix processes not on TCP-IP). Without IPN you need a server. With IPN the sender creates the socket connects to it and feed it with data packets. All the interested receivers connects to it and start reading. That's all. This is already done frequently in userspace. Just register a port number with IANA on which to implement a "registration" server and write a little daemon to listen on 127.0.0.1:${YOUR_PORT}. Your interconnecting programs then use either unicast or multicast sockets to bind, then report to the registration server what service you are offering and what port it's on. Your "receivers" then connect to the registration server, ask what port a given service is on, and then multicast-listen or unicast-connect to access that service. The best part is that all of the performance implications are already thoroughly understood. Furthermore, if you want to extend your communication protocol to other hosts as well, you just have to replace the 127.0.0.1 bind with a global bind. This is exactly how the standard-specified multiple-participant "SIP" protocol works, for example. So if you really think this is something that belongs in the kernel you need to provide much more detailed descriptions and use-cases for why it cannot be implemented in user-space or with small modifications to existing UDP/TCP networking. Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Futexes and network filesystems.
On Nov 20, 2007, at 17:53:52, Er ic W. Biederman wrote: I had a chance to think about this a bit more, and realized that the problem is that futexes don't appear to work on network filesystems, even if the network filesystems provide coherent shared memory. It seems to me that we need to have a call that gets a unique token for a process for each filesystem per filesystem for use in futexes (especially robust futexes). Say get_fs_task_id(const char *path); On local filesystems this could just be the pid as we use today, but for filesystems that can be accessed from contexts with potentially overlapping pid values this could be something else. It is an extra syscall in the preparation path, but it should be hardly more expensive the current getpid(). Once we have fixed the futex infrastructure to be able to handle futexes on network filesystems, the pid namespace case will be trivial to implement. Actually, I would think that get_vm_task_id(void *addr) would be a more useful interface. The call would still be a relatively simple lookup to find the struct file associated with the particular virtual mapping, but it would be race-free from the perspective of userspace and would not require that we somehow figure out the file descriptor associated with a particular mmap() (which may be closed by this point in time). Useful extension would be the get_fd_task_id(int fd) and get_fs_task_id(const char *path), but those are less important. The other important thing is to ensure that somehow the numbers are considered unique only within the particular domain of a container, such that you can migrate a container from one system to another even using a simple local ext3 filesystem (on a networked block device) and still be able to have things work properly even after the migration. Naturally this would only work with an upgraded libc but I think that's a reasonable requirement to enforce for migration of futexes and cross-network futexes. Even for network filesystems which don't implement coherent shared memory, you might add a memexcl() system call which (when used by multiple cooperating processes) ensures that a given page is only ever mapped by at most one computer accessing a given network filesystem. The page-outs and page-ins when shuttling that page across the network would be expensive, but I believe the cost would be reasonable for many applications and it would allow traditional atomic ops on the mapped pages to take and release futexes in the uncontended case. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Nov 22, 2007, at 20:29:11, Alan Cox wrote: Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Not all. Some simply produce the wrong answer - thats oh so much more exciting. As one example, the MicroBlaze soft-core processor family designed for use on Xilinx FPGAs will (by default) simply forcibly zero the lower bits of the unaligned address, such that the following code will fail mysteriously: const char foo[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 }; printf("0x%08lx 0x%08lx 0x%08lx 0x%08lx\n", *((u32 *)(foo+0)), *((u32 *)(foo+1)), *((u32 *)(foo+2)), *((u32 *)(foo+3))); Instead of outputting: 0x00010203 0x01020304 0x02030405 0x03040506 It will output: 0x00010203 0x00010203 0x00010203 0x00010203 Other embedded architectures have very similar problems. Some may provide an "unaligned data access" exception, but offer insufficient information to repair the damage and resume execution. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree
On Nov 24, 2007, at 06:39:34, Crispin Cowan wrote: Andrew Morgan wrote: It feels to me as if a MAC "override capability" is, if true to its name, extra to the MAC model; any MAC model that needs an 'override' to function seems under-specified... SELinux clearly feels no need for one, That's not quite right. More specifically, it already has one in the form of unconfined_t. AppArmor has a similar escape hatch in the "Ux" permission. Its not that they don't need one, it is that they already have one. They get to have one because they allow you to actually write a policy that is more nuanced than "process label must dominate object label". Actually, a fully-secured strict-mode SELinux system will have no unconfined_t processes; none of my test systems have any. Generally "unconfined_t" is used for situations similar to what AppArmor was designed for, where the only "interesting" security is that of the daemon (which is properly labelled) and one or more of the users are unconfined. Even then "unconfined_t" is not an implicit part of the policy, it is explicitly given the ability to take any action on any object by rules in the policy, and it typically still falls under a few MLS labeling restrictions even in the targeted policy. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree
On Nov 24, 2007, at 22:36:43, Crispin Cowan wrote: Kyle Moffett wrote: Actually, a fully-secured strict-mode SELinux system will have no unconfined_t processes; none of my test systems have any. Generally "unconfined_t" is used for situations similar to what AppArmor was designed for, where the only "interesting" security is that of the daemon (which is properly labelled) and one or more of the users are unconfined. Interesting. In a Targeted Policy, you do your policy administration from unconfined_t. But how do you administer a Strict Policy machine? I can think of 2 ways: [snip] * there is some type that is tighter than unconfined_t but none the less has sufficient privilege to change policy To me, this would be semantically equivalent to unconfined_t, because any rogue code or user with this type could then fabricate unconfined_t and do what they want Well, in a strict SELinux system, someone who has been permitted the "Security Administrator" role (secadm_r) and who has logged in through a "login_t" process may modify and reload the policy. They are also permitted to view all files up to their clearance, write files below their level, and relabel files. On the other hand, they do not have any system-administration privileges (those are reserve for sysadm_r). Under the default policy the security administrator may disable SELinux completely, although that too can be adjusted as "load policy" is yet another specialized permission. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: freeze vs freezer
On Nov 27, 2007, at 12:40:24, Rafael J. Wysocki wrote: On Tuesday, 27 of November 2007, Matthew Garrett wrote: On Mon, Nov 26, 2007 at 10:53:34PM +0100, Rafael J. Wysocki wrote: On Monday, 26 of November 2007, David Chinner wrote: So how do you handle threads that are blocked on I/O or a lock during the system freeze process, then? We wait until they can continue. So if I have a process blocked on an unavilable NFS mount, I can't suspend? That's correct, you can't. [And I know what you're going to say. ;-)] Why exactly does suspend/hibernation depend on "TASK_INTERRUPTIBLE" instead of a zero preempt_count()? Really what we should do is just iterate over all of the actual physical devices and tell each one "Block new IO requests preemptably, finish pending DMA, put the hardware in low-power mode, and prepare for suspend/hibernate". As long as each driver knows how to do those simple things we can have an entirely consistent kernel image for both suspend and for hibernation. When all tasks are preemptable we can very trivially rely on the drivers to enforce the "Stop new IO submission" with a dirt-simple semaphore or waitqueue. The sleep itself will be TASK_UNINTERRUPTIBLE, but it will be done from a preemptible context. That way the system suspend time is the sum of the suspend times of the devices on the system, and the suspend time of any given device is the sum of its maximum non-preemptible critical section and the time to flush all of its remaining pending DMA/etc. This is almost completely independent of the load-level of the machine, and it does not depend on things like NFS filesystems. The one gotcha is that it does not flush dirty filesystem pages to disk first, although that could be fixed with a few VFS and blockdev hooks which hierarchically flush and "freeze" block devices and filesystems before actually disabling devices much the way that device-mapper can pause a device to take a snapshot and end up with a clean journal on the filesystem afterwards. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: freeze vs freezer
On Nov 27, 2007, at 17:49:18, Jeremy Fitzhardinge wrote: Rafael J. Wysocki wrote: Well, this is more-or-less how we all imagine that should be done eventually. The main problem is how to implement it without causing too much breakage. Also, there are some dirty details that need to be taken into consideration. For Xen suspend/resume, I'd like to use the freezer to get all threads into a known consistent state (where, specifically, they don't have any outstanding pagetable updates pending). In other words, the freezer as it currently stands is what I want, modulo some of these issues where it gets caught up unexpectedly. If threads end up getting frozen anywhere preempt isn't explicitly disabled, it wouldn't work for me. The problem with "one freezer" is that "known consistent state" means something completely different to every single driver and subsystem. Xen wants it to mean "No pending page table updates and no more updates from this point forward". A network driver wants it to mean "All pending network packets DMAed out or in and the device shut down with all remaining packets queued. A SATA controller wants it to mean "All DMA quiesced and no more commands", etc. The only way to have that work is to put minimal definitions of what state you care about in the drivers themselves. For Xen this means that you need to have an appropriately-timed suspend handler which hooks into Xen code very precisely to create and preserve the "No pending page table updates" state that you care about. It will be more work in the short term but it's the only maintainable solution in the long term IMO. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/26] Permit filesystem local caching
On Jan 15, 2008, at 18:46, David Howells wrote: (*) 01-keys-inc-payload.diff (*) 02-keys-search-keyring.diff (*) 03-keys-callout-blob.diff One vaguely related question: Is there presently any way to adjust the per-user max-key-data limit? I've been tinkering with using the new-ish MIT kerberos "KEYRING:" credentials-cache code to hold keys for persistent daemons. Unfortunately "root" keeps hitting the limit even with only about 16 keys allocated across a few sessions. After perusing the docs I can't find any documentation on adjusting the limits. I'd really like some way to specifically allow root to allocate up to several megs worth of non-swappable key data, although I suppose just increasing the global limit slightly wouldn't be bad either. If such functionality already exists then I'd appreciate a pointer to it (and possibly respond in kind with documentation patches). Cheers, Kyle Moffett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [MC] [CHECKER] Need help on mmap on FUSE (linux user-land file system)
On Mar 13, 2005, at 02:28, Junfeng Yang wrote: Forget to mention, we are checking linux 2.6. It appears to us that mmap doesnt' work for FUSE in linux 2.6. IIRC, the reason mmap doesn't work on FUSE is because when it dirties pages they cannot be flushed reliably, because writing them out involves calling a userspace process which may allocate RAM, etc. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG?] Signedness of __kernel_nlink_t on Sparc?
In include/asm-sparc/types.h, __kernel_nlink_t is signed, whereas on all the other architectures it is unsigned. Is this intentional, or a bug? Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] Make /proc/ chmod'able
On Mar 15, 2005, at 16:18, Rene Scharfe wrote: It's easily visible in the style of public toilets: in some contries you have one big room with no walls in between where all men or women merrily shit together, in other countries (like mine) every person can lock himself into a private closet. Both ways work, there's nothing too special about using a toilet, but I'm simply used to the privacy provided by those thin walls. I assure you, I don't do anything evil in there. :] Just as long as our labs "bathrooms" don't mysteriously get a bazillion walls all over the place on kernel upgrade, we're ok. I don't mind adding new options for advanced security, as long as you don't change the defaults. It's hard enough managing a boatload of workstations under ideal conditions. When the default settings change every month it gets really annoying really quickly. :-D. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Real-Time Preemption and RCU
On Mar 19, 2005, at 11:31, Ingo Molnar wrote: What about allowing only as many concurrent readers as there are CPUs? since a reader may be preempted by a higher prio task, there is no linear relationship between CPU utilization and the number of readers allowed. You could easily end up having all the nr_cpus readers preempted on one CPU. It gets pretty messy One solution I can think of, although it bloats memory usage for many-way boxen, is to just have a table in the rwlock with one entry per cpu. Each CPU would get one concurrent reader, others would need to sleep Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: current linus bk, error mounting root
On Mar 21, 2005, at 19:19, Andrew Morton wrote: Jon Smirl <[EMAIL PROTECTED]> wrote: Jens is right that this is a user space issue, but how many people are going to find this out the hard way when their root drives stop mounting. Since no one is complaining I have to assume that most kernel developers have their root device drivers built into the kernel. I was loading mine as a module since for a long time Redhat was not shipping kernels with SATA built in. I don't agree that this is a userspace issue. It's just not sane for a driver to be in an unusable state for an arbitrary length of time after modprobe returns. What about if I'm booting from a USB drive? In that case, because of the asynchrony of USB probing, it may take 1 or 2 seconds for my attached hub to power on, wake up, boot its embedded microprocessor, etc before it will respond to signals. In such a case, as far as the root hub can tell, there are _no_ external devices for a couple seconds, and that's ignoring that my external USB bootdrive may _also_ need time to "boot" before it will be accessible, and that's only once its parent hub has become available. I think that the kernel needs some kind of wait-for-device API that is accessible from kernel-space for the simple boot sequence, perhaps just waiting for a specific kobject to be detected and complete initialization. For an initrd/initramfs in userspace, dnotify on sysfs (For the static /dev case), or dnotify on /dev (For the udev case) should allow it to detect when the device is available. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forkbombing Linux distributions
On Mar 23, 2005, at 09:43, Jan Engelhardt wrote: brings down almost all linux distro's while other *nixes survives. Let's see if this can be confirmed. Here at my school we have the workstations running Debian testing. We have edited /etc/security/limits.conf to have a much more restrictive startup environment for user processes, limiting to 100 processes per user and clamping maximum CPU time to 4 hours per process. It's not failsafe, but we also have all of the kernel threads set at realtime levels, with the IRQ threads specifically set at SCHED_RR 99, and we have a sulogin-type process on tty12 at SCHED_RR 99. Even in the event of the worst kind of forkbomb, the terminal is as responsive as if nothing else were running and allows us to kill the offending processes easily, because when the scheduler refuses to interrupt the killall process to run anything else, no other forkbomb processes get started. I suppose a similar situation could be set up with a user-accessible server and a rate-limited SSH daemon if necessary, although a ttyS0 console via a console server might work better. In any case, I think that while there could perhaps be a better interface for user-limits in the kernel, the existing one works fine for most purposes, when combined with appropriate administrative tools. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OT] speeding boot process (was Re: [ANNOUNCE] hotplug-ng 001 release)
On Feb 14, 2005, at 20:17, Lee Revell wrote: On Mon, 2005-02-14 at 16:16 -0800, Tim Bird wrote: Lee Revell wrote: But, I was referring more to things like GDM not being started until all the other init scripts are done. Why not start it first, and let the network initialize while the user is logging in? There are a number of techniques used by CE vendors to get fast bootup time. Some CE products boot Linux in under 1 second. Sony's best Linux boot time in the lab (from power on to user space) was 148 milliseconds, on an ARM chip (running at 200 MHZ I believe). The reason I marked by response OT is that the time from power on to userspace does not seem to be a big problem. It's the amount of time from user space to presenting a login prompt that's way too long. My distro (Debian) runs all the init scripts one at a time, and GDM is the last thing that gets run. There is just no reason for this. We should start X and initialize the display and get the login prompt up there ASAP, and let the system acquire the DHCP lease and start sendmail and apache and get the date from the NTP server *in the background while I am logging in*. It's not rocket science. Such a system needs a drastically different bootup process than currently exists, including the ability to specify init-script dependencies. (Like for example user login via GDM (and with our setup, GDM working at all) requires that AFS is mounted and NIS is working, which both require the network to be available, which requires... You can see where this is going. I think eventually we need a better /sbin/init, one that can use a traditional legacy /etc/inittab file in addition to a newfangled simultaneous boot process with lots of ways to start various kinds of services. Unfortunately such a system will need a _LOT_ of work and testing to make sure it doesn't break existing setups. Oh well, I can dream, can't I? :-D Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Drive missing only with LVM kernel
On Jan 26, 2005, at 03:34, Jasper Koolhaas wrote: Oh, and I'm using a devfs so "cd /dev && ./MAKEDEV hdg" is not the solution I think. The odd thing is that without LVM compiled in the kernel or as module /dev/hdg is accessible through devfs and with LVM not. Well, devfs has been deprecated and mostly unmaintained since before 2.6.0 was released, so it really doesn't surprise me. Go download and install udev, hotplug, etc from your distro. Cheers, Kyle Moffett -BEGIN GEEK CODE BLOCK- Version: 3.12 GCM/CS/IT/U d- s++: a18 C>$ UB/L/X/*(+)>$ P+++()>$ L(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b(++) DI+ D+ G e->$ h!*()>++$ r !y?(-) --END GEEK CODE BLOCK-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/