Re: More documentation: system call how-to
On Wed, Aug 01, 2007 at 02:06:57PM -0400, Ulrich Drepper wrote: > I've added a few rules I could think of right now. What should be > added as well is a rule for 64-bit parameters on 32-bit platforms. I > leave this to the s390 people who have the biggest restrictions when > it comes to this. David Woodhouse wrote that already. Don't know if there is a patch pending: http://marc.info/?l=linux-arch=118277150812137=2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More documentation: system call how-to
Hi Ulrich, On Wed, 1 Aug 2007, Ulrich Drepper wrote: > How about adding the attached text to the Documentation directory? I > had to correct over the years to one or the other system call design > problems. Other problems couldn't be corrected anymore and we have to > live with them. Maybe spelling out the rules explicitly will help a bit. Most definitely, but going through the list below, I could think of maybe several more little things that people tend to forget when actually /implementing/ the system call (and not necessarily the abstract level design decisions such as argument(s) and sizes). > I've added a few rules I could think of right now. What should be > added as well is a rule for 64-bit parameters on 32-bit platforms. I > leave this to the s390 people who have the biggest restrictions when > it comes to this. Yes, that must definitely be spelt out clearly, probably with examples of how to do it right. Another thing that's a must when designing a syscall would be thinking of any security implications that it brings about and clearly spelling out expected behaviour in all cases -- security could mean different things for different syscalls, but just getting that word in here would mean people don't make basic mistakes like introducing "xxx_set_xxx" kind of syscalls that go ahead and modify kernel/global structures without authors having even thought of how and why that's wrong. Other than that, as I said above, probably what we also need is a "system call implementation checklist" of some sort, which lists out the basic things (copying buffers from/to userspace, various security checks, other things I'm not recollecting currently) and how to get them right. > Signed-off-by: Ulrich Drepper <[EMAIL PROTECTED]> > > Rules for designing new system calls > > > 1. Do not use multiplexing system calls. > >A practical argument is that it invariably reduces the number of >available parameters to the system call which will haunt people who >have to care about architectures with a limited set of registers >reserved for this purpose. > >Another aspect is that it is most likely slower. The caller in >most cases knows exactly which sub-function of the system call is >needed. If the decision about the sub-function is dynamic the >computation of the code could just as well be a computation of a >system call number. The difference lies in the kernel where the >multiplexing always has to happen, even if the required >sub-function is known to the caller ahead of time. > >Adding new system calls is much cheaper: it is a word in a table. >This is much less code and data than the switch statement or >if-cascade needed to implement the multiplexer. > >Bad examples: sys_socketcall on x86, sys_futex, and several more > > > 2. Use of ENOSYS: > >The runtime has to be able to distinguish non-existing system calls >due to old kernel versions from error conditions in an implemented >system call. This means the ENOSYS error should never be used in >an error condition once a system call is implemented. > >Example: In sys_fallocate, if the file system does not implement the >fallocate operation, return EOPNOTSUPP and not ENOSYS. > >There is one exception to the rule: if rule #1 is violated and a >multiplexer system call is used, invalid sub-function codes should >be signaled using ENOSYS. > >Example: sys_futex ^^^ Probably makes sense to prefix "sad" or "unfortunate" here. > 3. Choose parameters for growth > >It makes today no sense anymore to implement any system call which >restricts even on 32-bit machines the size of values indicating >file sizes or offsets to 32-bits. 64-bit values should be used >throughout. > >Example: sys_fadvise64, which should have been defined from day 1 >like sys_fadvise64_64. Again, this is a "bad" example. >Similarly, timeout granularity of seconds is not suitable anymore. >Most interfaces use nano-second resolution and a often used way >to specify such times and intervals is using the timespec structure. Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
More documentation: system call how-to
How about adding the attached text to the Documentation directory? I had to correct over the years to one or the other system call design problems. Other problems couldn't be corrected anymore and we have to live with them. Maybe spelling out the rules explicitly will help a bit. I've added a few rules I could think of right now. What should be added as well is a rule for 64-bit parameters on 32-bit platforms. I leave this to the s390 people who have the biggest restrictions when it comes to this. Signed-off-by: Ulrich Drepper <[EMAIL PROTECTED]> Rules for designing new system calls 1. Do not use multiplexing system calls. A practical argument is that it invariably reduces the number of available parameters to the system call which will haunt people who have to care about architectures with a limited set of registers reserved for this purpose. Another aspect is that it is most likely slower. The caller in most cases knows exactly which sub-function of the system call is needed. If the decision about the sub-function is dynamic the computation of the code could just as well be a computation of a system call number. The difference lies in the kernel where the multiplexing always has to happen, even if the required sub-function is known to the caller ahead of time. Adding new system calls is much cheaper: it is a word in a table. This is much less code and data than the switch statement or if-cascade needed to implement the multiplexer. Bad examples: sys_socketcall on x86, sys_futex, and several more 2. Use of ENOSYS: The runtime has to be able to distinguish non-existing system calls due to old kernel versions from error conditions in an implemented system call. This means the ENOSYS error should never be used in an error condition once a system call is implemented. Example: In sys_fallocate, if the file system does not implement the fallocate operation, return EOPNOTSUPP and not ENOSYS. There is one exception to the rule: if rule #1 is violated and a multiplexer system call is used, invalid sub-function codes should be signaled using ENOSYS. Example: sys_futex 3. Choose parameters for growth It makes today no sense anymore to implement any system call which restricts even on 32-bit machines the size of values indicating file sizes or offsets to 32-bits. 64-bit values should be used throughout. Example: sys_fadvise64, which should have been defined from day 1 like sys_fadvise64_64. Similarly, timeout granularity of seconds is not suitable anymore. Most interfaces use nano-second resolution and a often used way to specify such times and intervals is using the timespec structure. 4. 32-bit compatibility Kernels for architectures like x86-64 and PPC64 have to be able to execute 32-bit binaries as well. The implementation of the actual system calls is of course shared. The types for the system call parameters and return values on 32-bit and 64-bit systems can be different. This is where compatibility wrappers come in. These functions, usually named compat_sys_XYZ for a system call sys_XYZ, are only needed in case the system call parameter is a pointer to a structure which has a different representation in 32- and 64-bit mode. Differences in size of integer or pointer arguments does not require a compatibility wrapper. Examples: compat_sys_utimensat, which has to convert a timespec structure from 32-bit to 64-bit. See also rule #3. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
More documentation: system call how-to
How about adding the attached text to the Documentation directory? I had to correct over the years to one or the other system call design problems. Other problems couldn't be corrected anymore and we have to live with them. Maybe spelling out the rules explicitly will help a bit. I've added a few rules I could think of right now. What should be added as well is a rule for 64-bit parameters on 32-bit platforms. I leave this to the s390 people who have the biggest restrictions when it comes to this. Signed-off-by: Ulrich Drepper [EMAIL PROTECTED] Rules for designing new system calls 1. Do not use multiplexing system calls. A practical argument is that it invariably reduces the number of available parameters to the system call which will haunt people who have to care about architectures with a limited set of registers reserved for this purpose. Another aspect is that it is most likely slower. The caller in most cases knows exactly which sub-function of the system call is needed. If the decision about the sub-function is dynamic the computation of the code could just as well be a computation of a system call number. The difference lies in the kernel where the multiplexing always has to happen, even if the required sub-function is known to the caller ahead of time. Adding new system calls is much cheaper: it is a word in a table. This is much less code and data than the switch statement or if-cascade needed to implement the multiplexer. Bad examples: sys_socketcall on x86, sys_futex, and several more 2. Use of ENOSYS: The runtime has to be able to distinguish non-existing system calls due to old kernel versions from error conditions in an implemented system call. This means the ENOSYS error should never be used in an error condition once a system call is implemented. Example: In sys_fallocate, if the file system does not implement the fallocate operation, return EOPNOTSUPP and not ENOSYS. There is one exception to the rule: if rule #1 is violated and a multiplexer system call is used, invalid sub-function codes should be signaled using ENOSYS. Example: sys_futex 3. Choose parameters for growth It makes today no sense anymore to implement any system call which restricts even on 32-bit machines the size of values indicating file sizes or offsets to 32-bits. 64-bit values should be used throughout. Example: sys_fadvise64, which should have been defined from day 1 like sys_fadvise64_64. Similarly, timeout granularity of seconds is not suitable anymore. Most interfaces use nano-second resolution and a often used way to specify such times and intervals is using the timespec structure. 4. 32-bit compatibility Kernels for architectures like x86-64 and PPC64 have to be able to execute 32-bit binaries as well. The implementation of the actual system calls is of course shared. The types for the system call parameters and return values on 32-bit and 64-bit systems can be different. This is where compatibility wrappers come in. These functions, usually named compat_sys_XYZ for a system call sys_XYZ, are only needed in case the system call parameter is a pointer to a structure which has a different representation in 32- and 64-bit mode. Differences in size of integer or pointer arguments does not require a compatibility wrapper. Examples: compat_sys_utimensat, which has to convert a timespec structure from 32-bit to 64-bit. See also rule #3. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More documentation: system call how-to
Hi Ulrich, On Wed, 1 Aug 2007, Ulrich Drepper wrote: How about adding the attached text to the Documentation directory? I had to correct over the years to one or the other system call design problems. Other problems couldn't be corrected anymore and we have to live with them. Maybe spelling out the rules explicitly will help a bit. Most definitely, but going through the list below, I could think of maybe several more little things that people tend to forget when actually /implementing/ the system call (and not necessarily the abstract level design decisions such as argument(s) and sizes). I've added a few rules I could think of right now. What should be added as well is a rule for 64-bit parameters on 32-bit platforms. I leave this to the s390 people who have the biggest restrictions when it comes to this. Yes, that must definitely be spelt out clearly, probably with examples of how to do it right. Another thing that's a must when designing a syscall would be thinking of any security implications that it brings about and clearly spelling out expected behaviour in all cases -- security could mean different things for different syscalls, but just getting that word in here would mean people don't make basic mistakes like introducing xxx_set_xxx kind of syscalls that go ahead and modify kernel/global structures without authors having even thought of how and why that's wrong. Other than that, as I said above, probably what we also need is a system call implementation checklist of some sort, which lists out the basic things (copying buffers from/to userspace, various security checks, other things I'm not recollecting currently) and how to get them right. Signed-off-by: Ulrich Drepper [EMAIL PROTECTED] Rules for designing new system calls 1. Do not use multiplexing system calls. A practical argument is that it invariably reduces the number of available parameters to the system call which will haunt people who have to care about architectures with a limited set of registers reserved for this purpose. Another aspect is that it is most likely slower. The caller in most cases knows exactly which sub-function of the system call is needed. If the decision about the sub-function is dynamic the computation of the code could just as well be a computation of a system call number. The difference lies in the kernel where the multiplexing always has to happen, even if the required sub-function is known to the caller ahead of time. Adding new system calls is much cheaper: it is a word in a table. This is much less code and data than the switch statement or if-cascade needed to implement the multiplexer. Bad examples: sys_socketcall on x86, sys_futex, and several more 2. Use of ENOSYS: The runtime has to be able to distinguish non-existing system calls due to old kernel versions from error conditions in an implemented system call. This means the ENOSYS error should never be used in an error condition once a system call is implemented. Example: In sys_fallocate, if the file system does not implement the fallocate operation, return EOPNOTSUPP and not ENOSYS. There is one exception to the rule: if rule #1 is violated and a multiplexer system call is used, invalid sub-function codes should be signaled using ENOSYS. Example: sys_futex ^^^ Probably makes sense to prefix sad or unfortunate here. 3. Choose parameters for growth It makes today no sense anymore to implement any system call which restricts even on 32-bit machines the size of values indicating file sizes or offsets to 32-bits. 64-bit values should be used throughout. Example: sys_fadvise64, which should have been defined from day 1 like sys_fadvise64_64. Again, this is a bad example. Similarly, timeout granularity of seconds is not suitable anymore. Most interfaces use nano-second resolution and a often used way to specify such times and intervals is using the timespec structure. Satyam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More documentation: system call how-to
On Wed, Aug 01, 2007 at 02:06:57PM -0400, Ulrich Drepper wrote: I've added a few rules I could think of right now. What should be added as well is a rule for 64-bit parameters on 32-bit platforms. I leave this to the s390 people who have the biggest restrictions when it comes to this. David Woodhouse wrote that already. Don't know if there is a patch pending: http://marc.info/?l=linux-archm=118277150812137w=2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/