Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )
Mikulas Patocka <[EMAIL PROTECTED]> writes: > Imagine that there is specification of mark_buffer_dirty. That > specification says that > 1. it may not block > 2. it may block > > In case 1. implementators wouldn't change it to block in stable kernel > relese because they don't want to violate the specification. > > In case 2. implementators of ext2 wouldn't assume that it doesn't block > even if it doesn't in current implementation. Whenever the question has been asked the answer is always assume anything in the kernel outside of the current function blocks. > In both cases, the bug wouldn't be created. Nope. It looks like someone made a mistake in ext2... > > Anytime you change implementation of syscalls, you gotta check all > applications that use them ;-) Luckily not - because there is > specification and you can check that syscalls conform to the > specification, not apps. Not normally. The rule is that syscall don't change period. The internal kernel interface is different. It is allowed to change. As for syscall changing auditing most apps did happen when the LFS spec was put together. So you would have an implementation that would keep most apps from failing on large files. > > > Saying "code is the specification" is not good. > > > > I'm not arguing against documentation. That is dumb. But the code is > > ALWAYS canonical. Not docs. > > Let's see: > Who is right? If there is no specification Hmm. The developers should get together and pow wow when the problem is noticed. When it is finally talked out about how it should happen then the code should get fixed accordingly. It isn't about right and wrong it is about working code. Not that documenting things doesn't help. And 2.4 is going in that direction... Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )
> One of these things must happen: > > a. follow the specification, even if that makes code slow and contorted > b. change the specification > c. ignore the specification > d. get rid of the specification > > Option "a" will not be accepted around here. Sorry. It should be followed in stable releases. (and usually is - except for few cases - and except that there is no specification, just unwritten rules). > The best you can > hope for is option "b". Since that is hard work (want to help?) we > often end up not using a specification... hopefully by just not > having one, instead of by ignoring one. > > Now implementators of TCP will say: that driver is buggy. Everybody should > > set state=TASK_RUNNING before calling schedule to yield the process. > > > > Implementators of driver will say: TCP is buggy - no one should call my > > driver in TASK_[UN]INTERRUPTIBLE state. > > > > Who is right? If there is no specification > > The driver is buggy, unless the TCP maintainer can be convinced > that TCP is buggy. TCP is a big chunk of code that most people use, > while the driver is not so huge or critical. > > The TCP maintainers do not seem to be sadistic bastards hell-bent on > breaking your drivers. API changes usually have a good reason. Why should block device developers read TCP/IP code? And only after reading significant amount of it they realize that they can be called in TASK_INTERRUPTIBLE state. They will most likely read other block drivers, find using schedule without setting state and use it also that way. The only way to tell developers to always set state before using schedule is to write it to specification. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )
Mikulas Patocka writes: > Imagine that there is specification of mark_buffer_dirty. That > specification says that > 1. it may not block > 2. it may block > > In case 1. implementators wouldn't change it to block in stable kernel > relese because they don't want to violate the specification. One of these things must happen: a. follow the specification, even if that makes code slow and contorted b. change the specification c. ignore the specification d. get rid of the specification Option "a" will not be accepted around here. Sorry. The best you can hope for is option "b". Since that is hard work (want to help?) we often end up not using a specification... hopefully by just not having one, instead of by ignoring one. Not saying it doesn't suck to have things undocumented, but at least you don't have to reverse-engineer a multi-megabyte binary kernel to find out what is going on. >> Anytime you change implementation, you gotta check all drivers that use >> them. I know, I'm one of the grunts that does such reviews and changes. > > Anytime you change implementation of syscalls, you gotta check all > applications that use them ;-) Luckily not - because there is > specification and you can check that syscalls conform to the > specification, not apps. Syscalls are more stable, but they may be changed after many years of a transition period. The C library hides some of this from users. > Now implementators of TCP will say: that driver is buggy. Everybody should > set state=TASK_RUNNING before calling schedule to yield the process. > > Implementators of driver will say: TCP is buggy - no one should call my > driver in TASK_[UN]INTERRUPTIBLE state. > > Who is right? If there is no specification The driver is buggy, unless the TCP maintainer can be convinced that TCP is buggy. TCP is a big chunk of code that most people use, while the driver is not so huge or critical. The TCP maintainers do not seem to be sadistic bastards hell-bent on breaking your drivers. API changes usually have a good reason. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )
> > > > I suspect part of the problem with commercial driver support on Linux is that > > > > the Linux driver API (such as it is) is relatively poorly documented > > > > > > In-kernel documentation, agreed. > > > > > > _Linux Device Drivers_ is a good reference for 2.2 and below. > > > > And do implementators of generic kernel functions and developers of device > > drivers respect it? And how can they respect it if it's a commercial book? > > _Linux Device Drivers_ documents the 2.2 (and previous) API, and > thus refutes the argument that the kernel API is poorly documented. > Since the publication of the book -succeeds- the publication of the > APIs, your questions are not applicable. What does it say about mark_buffer_dirty blocking or schedule and TASK_[UN]INTERRUPTIBLE issues? If it says nothing, it is bad documentation. If it says something, kernel developers do not respect it and it is useless documentation... > > > > and seems > > > > to change almost on a week-by-week basis anyway. I've done my share of chasing > > > > the current kernel revision with drivers that aren't part of the kernel tree: > > > > by the time you update the driver to work with the current kernel revision, > > > > there's a new one out, and the driver doesn't compile with it. > > > > > > This is entirely in your imagination. Driver APIs are stable across the > > > stable series of kernels: 2.0.0 through 2.0.38, 2.2.0 through 2.2.18, > > > 2.4.0 through whatever. > > > > No true. Do you remember for example the mark_buffer_dirty change in some > > 2.2.x that triggered ext2 directory corruption? (mark_buffer_dirty was > > changed so that it could block). > > > > Another example of bug that comes from the lack of specification is > > calling of get_free_pages by non-running processes that caused lockups on > > all kernels < 2.2.15. And it is still not cleaned up - see tcp_recvmsg(). > > > > Having documentation could prevent this kind of bugs. > > Hardly. Imagine that there is specification of mark_buffer_dirty. That specification says that 1. it may not block 2. it may block In case 1. implementators wouldn't change it to block in stable kernel relese because they don't want to violate the specification. In case 2. implementators of ext2 wouldn't assume that it doesn't block even if it doesn't in current implementation. In both cases, the bug wouldn't be created. > No documentation is often -better- than bad documentation. Of course. But good documentation is better than no documentation :-) > > You don't need too > > long texts, just a brief description: "this function may be called from > > process/bh/interrupt context, it may/may not block, it may/may not be > > called in TASK_[UN]INTERURPTIBLE state, it may take these locks." > > > > With documentation developers would be able to change implementation of > > kernel functions without the need to recheck all drivers that use them. > > Anytime you change implementation, you gotta check all drivers that use > them. I know, I'm one of the grunts that does such reviews and changes. Anytime you change implementation of syscalls, you gotta check all applications that use them ;-) Luckily not - because there is specification and you can check that syscalls conform to the specification, not apps. > > Saying "code is the specification" is not good. > > I'm not arguing against documentation. That is dumb. But the code is > ALWAYS canonical. Not docs. Let's see: There are parts of code (1) that set state to TASK_[UN]INTERRUPTIBLE and then call some other complex functions, like page fault handlers. (for example tcp in 2.2) There are parts of code (2) that call schedule to yield the process assuming that the state is TASK_RUNNING. (including some drivers) Sooner or later will happen, that subroutine called from part (1) get somehow to part (2) and the process locks up. Now implementators of TCP will say: that driver is buggy. Everybody should set state=TASK_RUNNING before calling schedule to yield the process. Implementators of driver will say: TCP is buggy - no one should call my driver in TASK_[UN]INTERRUPTIBLE state. Who is right? If there is no specification Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )
I suspect part of the problem with commercial driver support on Linux is that the Linux driver API (such as it is) is relatively poorly documented In-kernel documentation, agreed. _Linux Device Drivers_ is a good reference for 2.2 and below. And do implementators of generic kernel functions and developers of device drivers respect it? And how can they respect it if it's a commercial book? _Linux Device Drivers_ documents the 2.2 (and previous) API, and thus refutes the argument that the kernel API is poorly documented. Since the publication of the book -succeeds- the publication of the APIs, your questions are not applicable. What does it say about mark_buffer_dirty blocking or schedule and TASK_[UN]INTERRUPTIBLE issues? If it says nothing, it is bad documentation. If it says something, kernel developers do not respect it and it is useless documentation... and seems to change almost on a week-by-week basis anyway. I've done my share of chasing the current kernel revision with drivers that aren't part of the kernel tree: by the time you update the driver to work with the current kernel revision, there's a new one out, and the driver doesn't compile with it. This is entirely in your imagination. Driver APIs are stable across the stable series of kernels: 2.0.0 through 2.0.38, 2.2.0 through 2.2.18, 2.4.0 through whatever. No true. Do you remember for example the mark_buffer_dirty change in some 2.2.x that triggered ext2 directory corruption? (mark_buffer_dirty was changed so that it could block). Another example of bug that comes from the lack of specification is calling of get_free_pages by non-running processes that caused lockups on all kernels 2.2.15. And it is still not cleaned up - see tcp_recvmsg(). Having documentation could prevent this kind of bugs. Hardly. Imagine that there is specification of mark_buffer_dirty. That specification says that 1. it may not block 2. it may block In case 1. implementators wouldn't change it to block in stable kernel relese because they don't want to violate the specification. In case 2. implementators of ext2 wouldn't assume that it doesn't block even if it doesn't in current implementation. In both cases, the bug wouldn't be created. No documentation is often -better- than bad documentation. Of course. But good documentation is better than no documentation :-) You don't need too long texts, just a brief description: "this function may be called from process/bh/interrupt context, it may/may not block, it may/may not be called in TASK_[UN]INTERURPTIBLE state, it may take these locks." With documentation developers would be able to change implementation of kernel functions without the need to recheck all drivers that use them. Anytime you change implementation, you gotta check all drivers that use them. I know, I'm one of the grunts that does such reviews and changes. Anytime you change implementation of syscalls, you gotta check all applications that use them ;-) Luckily not - because there is specification and you can check that syscalls conform to the specification, not apps. Saying "code is the specification" is not good. I'm not arguing against documentation. That is dumb. But the code is ALWAYS canonical. Not docs. Let's see: There are parts of code (1) that set state to TASK_[UN]INTERRUPTIBLE and then call some other complex functions, like page fault handlers. (for example tcp in 2.2) There are parts of code (2) that call schedule to yield the process assuming that the state is TASK_RUNNING. (including some drivers) Sooner or later will happen, that subroutine called from part (1) get somehow to part (2) and the process locks up. Now implementators of TCP will say: that driver is buggy. Everybody should set state=TASK_RUNNING before calling schedule to yield the process. Implementators of driver will say: TCP is buggy - no one should call my driver in TASK_[UN]INTERRUPTIBLE state. Who is right? If there is no specification Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )
Mikulas Patocka writes: Imagine that there is specification of mark_buffer_dirty. That specification says that 1. it may not block 2. it may block In case 1. implementators wouldn't change it to block in stable kernel relese because they don't want to violate the specification. One of these things must happen: a. follow the specification, even if that makes code slow and contorted b. change the specification c. ignore the specification d. get rid of the specification Option "a" will not be accepted around here. Sorry. The best you can hope for is option "b". Since that is hard work (want to help?) we often end up not using a specification... hopefully by just not having one, instead of by ignoring one. Not saying it doesn't suck to have things undocumented, but at least you don't have to reverse-engineer a multi-megabyte binary kernel to find out what is going on. Anytime you change implementation, you gotta check all drivers that use them. I know, I'm one of the grunts that does such reviews and changes. Anytime you change implementation of syscalls, you gotta check all applications that use them ;-) Luckily not - because there is specification and you can check that syscalls conform to the specification, not apps. Syscalls are more stable, but they may be changed after many years of a transition period. The C library hides some of this from users. Now implementators of TCP will say: that driver is buggy. Everybody should set state=TASK_RUNNING before calling schedule to yield the process. Implementators of driver will say: TCP is buggy - no one should call my driver in TASK_[UN]INTERRUPTIBLE state. Who is right? If there is no specification The driver is buggy, unless the TCP maintainer can be convinced that TCP is buggy. TCP is a big chunk of code that most people use, while the driver is not so huge or critical. The TCP maintainers do not seem to be sadistic bastards hell-bent on breaking your drivers. API changes usually have a good reason. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )
One of these things must happen: a. follow the specification, even if that makes code slow and contorted b. change the specification c. ignore the specification d. get rid of the specification Option "a" will not be accepted around here. Sorry. It should be followed in stable releases. (and usually is - except for few cases - and except that there is no specification, just unwritten rules). The best you can hope for is option "b". Since that is hard work (want to help?) we often end up not using a specification... hopefully by just not having one, instead of by ignoring one. Now implementators of TCP will say: that driver is buggy. Everybody should set state=TASK_RUNNING before calling schedule to yield the process. Implementators of driver will say: TCP is buggy - no one should call my driver in TASK_[UN]INTERRUPTIBLE state. Who is right? If there is no specification The driver is buggy, unless the TCP maintainer can be convinced that TCP is buggy. TCP is a big chunk of code that most people use, while the driver is not so huge or critical. The TCP maintainers do not seem to be sadistic bastards hell-bent on breaking your drivers. API changes usually have a good reason. Why should block device developers read TCP/IP code? And only after reading significant amount of it they realize that they can be called in TASK_INTERRUPTIBLE state. They will most likely read other block drivers, find using schedule without setting state and use it also that way. The only way to tell developers to always set state before using schedule is to write it to specification. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: The lack of specification (was Re: [LONG RANT] Re: Linux stifles innovation... )
Mikulas Patocka [EMAIL PROTECTED] writes: Imagine that there is specification of mark_buffer_dirty. That specification says that 1. it may not block 2. it may block In case 1. implementators wouldn't change it to block in stable kernel relese because they don't want to violate the specification. In case 2. implementators of ext2 wouldn't assume that it doesn't block even if it doesn't in current implementation. Whenever the question has been asked the answer is always assume anything in the kernel outside of the current function blocks. In both cases, the bug wouldn't be created. Nope. It looks like someone made a mistake in ext2... Anytime you change implementation of syscalls, you gotta check all applications that use them ;-) Luckily not - because there is specification and you can check that syscalls conform to the specification, not apps. Not normally. The rule is that syscall don't change period. The internal kernel interface is different. It is allowed to change. As for syscall changing auditing most apps did happen when the LFS spec was put together. So you would have an implementation that would keep most apps from failing on large files. Saying "code is the specification" is not good. I'm not arguing against documentation. That is dumb. But the code is ALWAYS canonical. Not docs. Let's see: Who is right? If there is no specification Hmm. The developers should get together and pow wow when the problem is noticed. When it is finally talked out about how it should happen then the code should get fixed accordingly. It isn't about right and wrong it is about working code. Not that documenting things doesn't help. And 2.4 is going in that direction... Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/