Re: [f2fs-dev] [PATCH] f2fs: give RO message when recovering superblock
On Wed, Mar 23, 2016 at 01:38:19PM -0700, Jaegeuk Kim <jaeg...@kernel.org> wrote: > When one of superblocks is missing, f2fs recovers it with the valid one. > But, even if f2fs is mounted as RO, we'd better notify that too. (I have written this in my other mail, but in case you didn't see it, because it wasn't directly sent to you, I replied directly). Basically all other filesystems do not treat "ro" as anything but as a vfs flag - the mounted volume will be readonly, but they will happily write to the volume for recovery or integrity purposes. This has been extensively discussed on lkml in the past and it was decided that overloading "ro" to have two different meanings is bad. If f2fs wants to suppress writes, it should use the norecovery option to decide, not the ro option. This is the behaviour that other filesystems follow (at least extN, xfs). Unless f2fs has a very good reason (which I don't think it has), it should behave like the other filesystems, and treat "ro" merely as a vfs flag to suppress writing. There is a third reason to not change the meaning: typically, the root fs is mounted ro first and later rw. Therefore f2fs must make sure to have full integrity on a ro mount, even if that means writing to the backing store. It isn't acceptable to make ro mounts fail when rw mounts would work, for example, when upgrading the kernel and rebooting. -- The choice of a Deliantra, the free code+content MORPG -==- _GNU_ http://www.deliantra.net ==-- _ generation ---==---(_)__ __ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schm...@schmorp.de -=/_/_//_/\_,_/ /_/\_\
Re: [f2fs-dev] [PATCH] f2fs: give RO message when recovering superblock
On Wed, Mar 23, 2016 at 01:38:19PM -0700, Jaegeuk Kim wrote: > When one of superblocks is missing, f2fs recovers it with the valid one. > But, even if f2fs is mounted as RO, we'd better notify that too. (I have written this in my other mail, but in case you didn't see it, because it wasn't directly sent to you, I replied directly). Basically all other filesystems do not treat "ro" as anything but as a vfs flag - the mounted volume will be readonly, but they will happily write to the volume for recovery or integrity purposes. This has been extensively discussed on lkml in the past and it was decided that overloading "ro" to have two different meanings is bad. If f2fs wants to suppress writes, it should use the norecovery option to decide, not the ro option. This is the behaviour that other filesystems follow (at least extN, xfs). Unless f2fs has a very good reason (which I don't think it has), it should behave like the other filesystems, and treat "ro" merely as a vfs flag to suppress writing. There is a third reason to not change the meaning: typically, the root fs is mounted ro first and later rw. Therefore f2fs must make sure to have full integrity on a ro mount, even if that means writing to the backing store. It isn't acceptable to make ro mounts fail when rw mounts would work, for example, when upgrading the kernel and rebooting. -- The choice of a Deliantra, the free code+content MORPG -==- _GNU_ http://www.deliantra.net ==-- _ generation ---==---(_)__ __ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schm...@schmorp.de -=/_/_//_/\_,_/ /_/\_\
Re: epoll design problems with common fork/exec patterns
On Sat, Oct 27, 2007 at 12:23:52PM +0200, Eric Dumazet <[EMAIL PROTECTED]> wrote: > >Q6 Will the close of an fd cause it to be removed from all epoll > >sets automatically? > >A6 Yes. > > Answer : epoll documentation cannot explain the full semantic of file epoll documentation easily can. there is nothig keeping it from it. don't make silly arguments like that. > Or should, since you had problems You are again implying I lakc understanding. That is, however, not true. I don't see the point in being insulted by you, so I won'T continue talking to you :( > The 'close' of a file is not close(fd) :) Good that you understand that. That is one of my problems, as the manpage talks about closing of the fd, but there are multiple ways to do that, and some are not handled the same way. > epoll has to deal with files, but documentation is a User side > documentation, so has to use 'file descriptors'. There is obviously no need for documentation to do that, contrary to your claim. The manpages for e.g. dup or the official sus manpages manage to document it (mostly) correctly, so your claim that documentation must use file descriptors when the underlying file structure is meant is disproven. > fork() is acting sort of dup() , as it increases all file refcounts. > > You have problems about close()/dup()/fork()/... file descriptors semantic, > which is handled by a layer independent from epoll stuff. No, I have no problem with dup at all. I have a problem with explicitlx closing file descriptors in the child will stop events for those files to be reported in the parent. I am sorry, but I epxlained this very clearly a number of times, but for some reason, apart from accusing me to not understanding files and file descritpors or (clear enough) documentation, you ignore that and instead hammer on other problems. To me, it seems you are not the one who understands. -- The choice of a Deliantra, the free code+content MORPG -==- _GNU_ http://www.deliantra.net ==-- _ generation ---==---(_)__ __ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] -=/_/_//_/\_,_/ /_/\_\ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: epoll design problems with common fork/exec patterns
On Sat, Oct 27, 2007 at 11:22:25AM +0200, Eric Dumazet <[EMAIL PROTECTED]> wrote: > >Well, it behaves like documented, which is the problem. You admit you > >don't understand the problem or the documentation, so again, no need to > >insult me. > > Hum... I will update my english vocabulary and mark "missed" as an insult. Well, ignoring my arguments by claiming I lack understanding is an insult, as you didn't take my arguments at face value but declassified them by attacking my person. > I have no problem with epoll nor its documentation. Thats fine for you. But I have, at least, with epoll, as the documented and observed behaviour makes epoll unusable as a general event loop replacement. > It doesnt on every kernels I had played with. And I played with *lot* of > kernels you know. No, I don't know that. And so far you only said you used fork+exec, not close in between, so maybe the playing you did was not related to this problem? I also played with a lot of kernels, but for epoll specifically, I played with 2.6.21-2-amd64 and 2.6.22-1-amd64, both from debian unstable with no customisations. > If such a bug exists on your kernel, please fill a complete bug report, > giving details. As this behaviour is clearly documented in the epoll manpage, why do you think it is a bug? I think its fairly bad, but at least tis documented as the behaviour it should be: Q6 Will the close of an fd cause it to be removed from all epoll sets automatically? A6 Yes. As such filing, a bug report for behaviour which isn't in fact a bug would be counterproductive. My goal in my mail was to find out if there are work arounds for this peculiar behaviour (Or inspire discussion on this behaviour). Of course, one can create big programs using epoll to their advantage. I never claimed otherwise. But as a general event loop replacement (i.e. outside of controleld environments), epoll does not currently qualify, as I would have to control an awful lot of code (think of an perl module interfacing to epoll: you would not have to control all third-party modules that might interfere with fork+close+exec. This is very common in scripting languages). -- The choice of a Deliantra, the free code+content MORPG -==- _GNU_ http://www.deliantra.net ==-- _ generation ---==---(_)__ __ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] -=/_/_//_/\_,_/ /_/\_\ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: epoll design problems with common fork/exec patterns
On Sat, Oct 27, 2007 at 10:23:17AM +0200, Eric Dumazet <[EMAIL PROTECTED]> wrote: > > In this case, the parent process works fine until the child closes fds, > > after which the fds become unarmed in the parent too. This works as > > I have no idea what exact problem you have. Well, I explained it rather succinctly, I think. If you tell me whats unclear I can explain... > But if the child closes some > file descriptor that were 'cloned' at fork() time, this only decrements a > refcount, and definitely should not close it for the 'parent'. It doesn't. It removes it from the epoll set, though, so the parent will not receive events for that fd anymore. > I have some apps that are happily using epoll() and fork()/exec() and have The problem I described is fork/close/exec. close being the explicit syscall. > no problem at all. I usually use O_CLOEXEC so that all close() are done at > exec() time without having to do it in a loop. epoll continues to work as > expected in the parent process. This is because epoll doesn't behave like documented: It removes the fd from the parents epoll set only on an explicit close() syscall, not on an implicit close from exec. > >fd sets. This would explain the behaviour above. Unfortunately (or > >fortunately?) this is not what happens: when the fds are being closed by > >exec or exit, the fds do not get removed from the epoll set. > > at exec() (granted CLOEXEC is asserted) or exit() time, only the refcount > of each file is decremented. Only if their refcount becomes NULL, files are > then removed from epoll set. Yes. But thats obviously not the only way to close fds. > >Is epoll really designed to be so incompatible with the most commno fork > >patterns? Shouldn't epoll do refcounting, as is commonly done under > >Unix? As the fd space is not shared between rpocesses, why does epoll > >try? Shouldn't the epoll information be copied just like the fd table > >itself, memory, and other resources? > > Too many questions here, showing lack of understanding. You already said you don't the problem. No need to get insulting :( > epoll definitly is not useless. It is used on major and critical apps. > You certainly missed something. Well, it behaves like documented, which is the problem. You admit you don't understand the problem or the documentation, so again, no need to insult me. > Please provide some code to illustrate one exact problem you have. // assume there is an open epoll set that listens for events on fd 5 if (fork () = 0) { close (5); // fd 5 is now removed from the epoll set of the parent. _exit (0); } -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
epoll design problems with common fork/exec patterns
Hi! I ran into what I see as unsolvable problems that make epoll useless as a generic event mechanism. I recently switched to libevent as event loop, and found that my programs work fine when it is using select or poll, but work eratically or halt when using epoll. The reason as I found out is the peculiar behaviour of epoll over fork. It doesn't work as documented, and even if, it would make the use of third-party libraries using fork usually impossible. Here are two scenarios where it screws up: - some library forks, explicitly closes all fd's it doesn't need, and execs another program (which is common behvaiour). In this case, the parent process works fine until the child closes fds, after which the fds become unarmed in the parent too. This works as documented, but since libraries expect this to work without affecting the parent, this puts a new and incompatible strain on what libraries can do, which in turn makes epoll unsuitable in cases where you don't control all your code. - I have a library that emulates asynchronous I/O with a thread pool, and uses a pipe for event notification. That library registers a fork handler that closes the pipe in the child and recreates it, so the child could continue doing AIO (as could the parent). This, too, screws up notifications for the parent, Now, the epoll manpage says that closing a fd will remove it from all fd sets. This would explain the behaviour above. Unfortunately (or fortunately?) this is not what happens: when the fds are being closed by exec or exit, the fds do not get removed from the epoll set. This behaviour strikes me as extremely illogical. On the one hand, one cannot share the epoll fd between processes normally, but on fork, you can, even though it makes no sense (the child has a different fd "namespace" than the parent) and actually works on (then( unrelated fds in the other process. It also strikes as weird that the order of closing fds should make so much of a difference: if the epoll fd is closed first in the child, the other fds will survive in the parent, if its closed last, they don't. Makes no sense to me. Now, the problem I see is not that it makes no sense to me - thats clearly my problem. The problem I see is that there is no way to avoid the associated problems except by patching all code that would ever use fork, even if it never has heard anything about epoll yet. This is extremely nonlocal action at a distance, as this affects a lot of code not even the author might be aware of (fork is rather common). To illustrate, here are some workarounds I thought about: - rearming all fds after fork: doesn't work, as the fds get removed asynchronously so I would have to wait for the child to do it. - closing the epoll fd after fork: doesn't work unless I control the fork. I can install a handler to be called using pthreads, but that won't help as other handlers might be called first (as in the case of the aio library above), screwing me. - closing and recreating the epoll fd before the fork: isn't support event remotely by libevent or similar event loops, and would not help either as I cnanot control the calls to fork. Is epoll really designed to be so incompatible with the most commno fork patterns? Shouldn't epoll do refcounting, as is commonly done under Unix? As the fd space is not shared between rpocesses, why does epoll try? Shouldn't the epoll information be copied just like the fd table itself, memory, and other resources? As it looks now, epoll looks useless except in the most controlled environments, as it doesn't duplicate state on fork as is done with the other fd-related resources (as opposed to the underlying files, which are properly shared). -- The choice of a -==- _GNU_ Deliantra, the free in data+content MORPG ==-- _ generation ---==---(_)__ __ __ http://www.deliantra.net/ --==---/ / _ \/ // /\ \/ / -=/_/_//_/\_,_/ /_/\_\ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
epoll design problems with common fork/exec patterns
Hi! I ran into what I see as unsolvable problems that make epoll useless as a generic event mechanism. I recently switched to libevent as event loop, and found that my programs work fine when it is using select or poll, but work eratically or halt when using epoll. The reason as I found out is the peculiar behaviour of epoll over fork. It doesn't work as documented, and even if, it would make the use of third-party libraries using fork usually impossible. Here are two scenarios where it screws up: - some library forks, explicitly closes all fd's it doesn't need, and execs another program (which is common behvaiour). In this case, the parent process works fine until the child closes fds, after which the fds become unarmed in the parent too. This works as documented, but since libraries expect this to work without affecting the parent, this puts a new and incompatible strain on what libraries can do, which in turn makes epoll unsuitable in cases where you don't control all your code. - I have a library that emulates asynchronous I/O with a thread pool, and uses a pipe for event notification. That library registers a fork handler that closes the pipe in the child and recreates it, so the child could continue doing AIO (as could the parent). This, too, screws up notifications for the parent, Now, the epoll manpage says that closing a fd will remove it from all fd sets. This would explain the behaviour above. Unfortunately (or fortunately?) this is not what happens: when the fds are being closed by exec or exit, the fds do not get removed from the epoll set. This behaviour strikes me as extremely illogical. On the one hand, one cannot share the epoll fd between processes normally, but on fork, you can, even though it makes no sense (the child has a different fd namespace than the parent) and actually works on (then( unrelated fds in the other process. It also strikes as weird that the order of closing fds should make so much of a difference: if the epoll fd is closed first in the child, the other fds will survive in the parent, if its closed last, they don't. Makes no sense to me. Now, the problem I see is not that it makes no sense to me - thats clearly my problem. The problem I see is that there is no way to avoid the associated problems except by patching all code that would ever use fork, even if it never has heard anything about epoll yet. This is extremely nonlocal action at a distance, as this affects a lot of code not even the author might be aware of (fork is rather common). To illustrate, here are some workarounds I thought about: - rearming all fds after fork: doesn't work, as the fds get removed asynchronously so I would have to wait for the child to do it. - closing the epoll fd after fork: doesn't work unless I control the fork. I can install a handler to be called using pthreads, but that won't help as other handlers might be called first (as in the case of the aio library above), screwing me. - closing and recreating the epoll fd before the fork: isn't support event remotely by libevent or similar event loops, and would not help either as I cnanot control the calls to fork. Is epoll really designed to be so incompatible with the most commno fork patterns? Shouldn't epoll do refcounting, as is commonly done under Unix? As the fd space is not shared between rpocesses, why does epoll try? Shouldn't the epoll information be copied just like the fd table itself, memory, and other resources? As it looks now, epoll looks useless except in the most controlled environments, as it doesn't duplicate state on fork as is done with the other fd-related resources (as opposed to the underlying files, which are properly shared). -- The choice of a -==- _GNU_ Deliantra, the free in data+content MORPG ==-- _ generation ---==---(_)__ __ __ http://www.deliantra.net/ --==---/ / _ \/ // /\ \/ / -=/_/_//_/\_,_/ /_/\_\ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: epoll design problems with common fork/exec patterns
On Sat, Oct 27, 2007 at 10:23:17AM +0200, Eric Dumazet [EMAIL PROTECTED] wrote: In this case, the parent process works fine until the child closes fds, after which the fds become unarmed in the parent too. This works as I have no idea what exact problem you have. Well, I explained it rather succinctly, I think. If you tell me whats unclear I can explain... But if the child closes some file descriptor that were 'cloned' at fork() time, this only decrements a refcount, and definitely should not close it for the 'parent'. It doesn't. It removes it from the epoll set, though, so the parent will not receive events for that fd anymore. I have some apps that are happily using epoll() and fork()/exec() and have The problem I described is fork/close/exec. close being the explicit syscall. no problem at all. I usually use O_CLOEXEC so that all close() are done at exec() time without having to do it in a loop. epoll continues to work as expected in the parent process. This is because epoll doesn't behave like documented: It removes the fd from the parents epoll set only on an explicit close() syscall, not on an implicit close from exec. fd sets. This would explain the behaviour above. Unfortunately (or fortunately?) this is not what happens: when the fds are being closed by exec or exit, the fds do not get removed from the epoll set. at exec() (granted CLOEXEC is asserted) or exit() time, only the refcount of each file is decremented. Only if their refcount becomes NULL, files are then removed from epoll set. Yes. But thats obviously not the only way to close fds. Is epoll really designed to be so incompatible with the most commno fork patterns? Shouldn't epoll do refcounting, as is commonly done under Unix? As the fd space is not shared between rpocesses, why does epoll try? Shouldn't the epoll information be copied just like the fd table itself, memory, and other resources? Too many questions here, showing lack of understanding. You already said you don't the problem. No need to get insulting :( epoll definitly is not useless. It is used on major and critical apps. You certainly missed something. Well, it behaves like documented, which is the problem. You admit you don't understand the problem or the documentation, so again, no need to insult me. Please provide some code to illustrate one exact problem you have. // assume there is an open epoll set that listens for events on fd 5 if (fork () = 0) { close (5); // fd 5 is now removed from the epoll set of the parent. _exit (0); } -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: epoll design problems with common fork/exec patterns
On Sat, Oct 27, 2007 at 11:22:25AM +0200, Eric Dumazet [EMAIL PROTECTED] wrote: Well, it behaves like documented, which is the problem. You admit you don't understand the problem or the documentation, so again, no need to insult me. Hum... I will update my english vocabulary and mark missed as an insult. Well, ignoring my arguments by claiming I lack understanding is an insult, as you didn't take my arguments at face value but declassified them by attacking my person. I have no problem with epoll nor its documentation. Thats fine for you. But I have, at least, with epoll, as the documented and observed behaviour makes epoll unusable as a general event loop replacement. It doesnt on every kernels I had played with. And I played with *lot* of kernels you know. No, I don't know that. And so far you only said you used fork+exec, not close in between, so maybe the playing you did was not related to this problem? I also played with a lot of kernels, but for epoll specifically, I played with 2.6.21-2-amd64 and 2.6.22-1-amd64, both from debian unstable with no customisations. If such a bug exists on your kernel, please fill a complete bug report, giving details. As this behaviour is clearly documented in the epoll manpage, why do you think it is a bug? I think its fairly bad, but at least tis documented as the behaviour it should be: Q6 Will the close of an fd cause it to be removed from all epoll sets automatically? A6 Yes. As such filing, a bug report for behaviour which isn't in fact a bug would be counterproductive. My goal in my mail was to find out if there are work arounds for this peculiar behaviour (Or inspire discussion on this behaviour). Of course, one can create big programs using epoll to their advantage. I never claimed otherwise. But as a general event loop replacement (i.e. outside of controleld environments), epoll does not currently qualify, as I would have to control an awful lot of code (think of an perl module interfacing to epoll: you would not have to control all third-party modules that might interfere with fork+close+exec. This is very common in scripting languages). -- The choice of a Deliantra, the free code+content MORPG -==- _GNU_ http://www.deliantra.net ==-- _ generation ---==---(_)__ __ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] -=/_/_//_/\_,_/ /_/\_\ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: epoll design problems with common fork/exec patterns
On Sat, Oct 27, 2007 at 12:23:52PM +0200, Eric Dumazet [EMAIL PROTECTED] wrote: Q6 Will the close of an fd cause it to be removed from all epoll sets automatically? A6 Yes. Answer : epoll documentation cannot explain the full semantic of file epoll documentation easily can. there is nothig keeping it from it. don't make silly arguments like that. Or should, since you had problems You are again implying I lakc understanding. That is, however, not true. I don't see the point in being insulted by you, so I won'T continue talking to you :( The 'close' of a file is not close(fd) :) Good that you understand that. That is one of my problems, as the manpage talks about closing of the fd, but there are multiple ways to do that, and some are not handled the same way. epoll has to deal with files, but documentation is a User side documentation, so has to use 'file descriptors'. There is obviously no need for documentation to do that, contrary to your claim. The manpages for e.g. dup or the official sus manpages manage to document it (mostly) correctly, so your claim that documentation must use file descriptors when the underlying file structure is meant is disproven. fork() is acting sort of dup() , as it increases all file refcounts. You have problems about close()/dup()/fork()/... file descriptors semantic, which is handled by a layer independent from epoll stuff. No, I have no problem with dup at all. I have a problem with explicitlx closing file descriptors in the child will stop events for those files to be reported in the parent. I am sorry, but I epxlained this very clearly a number of times, but for some reason, apart from accusing me to not understanding files and file descritpors or (clear enough) documentation, you ignore that and instead hammer on other problems. To me, it seems you are not the one who understands. -- The choice of a Deliantra, the free code+content MORPG -==- _GNU_ http://www.deliantra.net ==-- _ generation ---==---(_)__ __ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] -=/_/_//_/\_,_/ /_/\_\ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: masquerading failure for at least icmp and tcp+sack on amd64
On Tue, Sep 06, 2005 at 07:29:30PM +0200, Marc Lehmann <[EMAIL PROTECTED]> wrote: > Weird obervation 2: > > Some sites could be connected to with TCP. It turned out that those > sites did not support TCP SACK. Indeed, turning off SACK either on the > remote side of a connection or on the origonator side resulted in workign > masquerading: Sorry for the F'up, but this turned to be slightly untrue: turning off SACK makes the syn handshake happen, but some packets further down the stream the masquerading router sends a RST again. > Kernels that don't work: > >2.6.13-rc7 (compiled with gcc-3.4 and 4.0.2 debian), 2.6.13 (gcc-4.02) > I forgot to mention that the kernels that don't work are for amd64. In the meantime, I also tried out 2.6.11 (as I had some troubles with 2.6.12..2.6.13-rc7 on other amd64 machines), with the same result (reply packets are ignored/rejected). -- The choice of a -==- _GNU_ ----==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: masquerading failure for at least icmp and tcp+sack on amd64
On Tue, Sep 06, 2005 at 07:29:30PM +0200, Marc Lehmann [EMAIL PROTECTED] wrote: Weird obervation 2: Some sites could be connected to with TCP. It turned out that those sites did not support TCP SACK. Indeed, turning off SACK either on the remote side of a connection or on the origonator side resulted in workign masquerading: Sorry for the F'up, but this turned to be slightly untrue: turning off SACK makes the syn handshake happen, but some packets further down the stream the masquerading router sends a RST again. Kernels that don't work: 2.6.13-rc7 (compiled with gcc-3.4 and 4.0.2 debian), 2.6.13 (gcc-4.02) I forgot to mention that the kernels that don't work are for amd64. In the meantime, I also tried out 2.6.11 (as I had some troubles with 2.6.12..2.6.13-rc7 on other amd64 machines), with the same result (reply packets are ignored/rejected). -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
masquerading failure for at least icmp and tcp+sack on amd64
Hi! I recently upgraded a 32 bit machine to a new amd64 board+cpu. I took the same kernel (2.6.13-rc7) and just recompiled it for 64 bit, plus upgraded userspace to 64 bit. Firewall config stayed the same. Problem: neither ping nor tcp was being masqueraded properly. I created the following test-set-up: iptables -t mangle -F iptables -t filter -F iptables -t nat -F iptables -t nat -A POSTROUTING -p all -s 10.0.0.0/8 -d \! 10.0.0.0/8 -j MASQUERADE i..e the above masquerade rule should be the only firewall rule, and all fules shoul[d have policy ACCEPT. The effect was that tcp packets and icmp packets coming from 10.0.0.1 on interface eth0 were properly masqueraded on the outgoing "inet" interface (ppp0 renamed): eth0: 19:17:24.364351 IP 10.0.0.1.44320 > 129.13.162.95.80: S 3745828676:3745828676(0) win 5840 inet: 19:17:24.364505 IP 84.56.237.68.44320 > 129.13.162.95.80: S 3745828676:3745828676(0) win 5840 19:17:24.378029 IP 129.13.162.95.80 > 84.56.237.68.44320: S 3777391404:3777391404(0) ack 3745828677 win 5840 19:17:24.378103 IP 84.56.237.68.44320 > 129.13.162.95.80: R 3745828677:3745828677(0) win 0 However, the reverse packets were rejected. ip_conntrack showed this: tcp 6 52 SYN_SENT src=10.0.0.1 dst=129.13.162.95 sport=44320 dport=80 [UNREPLIED] src=129.13.162.95 dst=84.56.237.68 sport=80 dport=44320 mark=0 use=1 ICMP echo replies were also masqueraded, but the reply was ignored. Weird observation 1: ip route del default ip add default via 10.0.0.17 Resulted in working masquerading, this time over device "vpn0", which is a tuntap-interface. Working means that outgoing packets were correctly re-written with source 10.0.0.5 (local address of vpn0) and replie were correctly "un"-translated. Weird obervation 2: Some sites could be connected to with TCP. It turned out that those sites did not support TCP SACK. Indeed, turning off SACK either on the remote side of a connection or on the origonator side resulted in workign masquerading: eth0: 19:23:29.928470 IP 10.0.0.1.45611 > 129.13.162.95.80: S 4113365634:4113365634(0) win 5840 19:23:29.942246 IP 129.13.162.95.80 > 10.0.0.1.45611: S 4161877683:4161877683(0) ack 4113365635 win 5840 19:23:29.942313 IP 10.0.0.1.45611 > 129.13.162.95.80: . ack 1 win 5840 inet: 19:23:29.928249 IP 84.56.237.68.45611 > 129.13.162.95.80: S 4113365634:4113365634(0) win 5840 19:23:29.942199 IP 129.13.162.95.80 > 84.56.237.68.45611: S 4161877683:4161877683(0) ack 4113365635 win 5840 19:23:29.942332 IP 84.56.237.68.45611 > 129.13.162.95.80: . ack 1 win 5840 However, ICMP still is not masqueraded. Kernels that worked: 2.6.13-rc7, 2.6.12.5, 2.6.11 and lower, compiled for x86 with gcc-3.4 Kernels that don't work: 2.6.13-rc7 (compiled with gcc-3.4 and 4.0.2 debian), 2.6.13 (gcc-4.02) Kernel configuration was exactly the same for the 2.6.13-rc7 kernels, modulo the cpu and architectrue selections. I have a somewhat nontrivial source routing set-up on that machine that I could document more if that could be a possible reason for that problem. I am confident that this is not a configuration error, as the configuraiton worked basically unchanged since the 2.4 days, and I am confident it's not a iptables setup problem either, as I can reproduce it with empty rules except for the masquerading rule. I did not mention UDP because I didn't test it, but it's likely that UDP masquerading also fails. Any idea at what I could look at or try out to find out more about this problem? -- The choice of a -----==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
masquerading failure for at least icmp and tcp+sack on amd64
Hi! I recently upgraded a 32 bit machine to a new amd64 board+cpu. I took the same kernel (2.6.13-rc7) and just recompiled it for 64 bit, plus upgraded userspace to 64 bit. Firewall config stayed the same. Problem: neither ping nor tcp was being masqueraded properly. I created the following test-set-up: iptables -t mangle -F iptables -t filter -F iptables -t nat -F iptables -t nat -A POSTROUTING -p all -s 10.0.0.0/8 -d \! 10.0.0.0/8 -j MASQUERADE i..e the above masquerade rule should be the only firewall rule, and all fules shoul[d have policy ACCEPT. The effect was that tcp packets and icmp packets coming from 10.0.0.1 on interface eth0 were properly masqueraded on the outgoing inet interface (ppp0 renamed): eth0: 19:17:24.364351 IP 10.0.0.1.44320 129.13.162.95.80: S 3745828676:3745828676(0) win 5840 mss 1460,nop,nop,sackOK inet: 19:17:24.364505 IP 84.56.237.68.44320 129.13.162.95.80: S 3745828676:3745828676(0) win 5840 mss 1452,nop,nop,sackOK 19:17:24.378029 IP 129.13.162.95.80 84.56.237.68.44320: S 3777391404:3777391404(0) ack 3745828677 win 5840 mss 1460,nop,nop,sackOK 19:17:24.378103 IP 84.56.237.68.44320 129.13.162.95.80: R 3745828677:3745828677(0) win 0 However, the reverse packets were rejected. ip_conntrack showed this: tcp 6 52 SYN_SENT src=10.0.0.1 dst=129.13.162.95 sport=44320 dport=80 [UNREPLIED] src=129.13.162.95 dst=84.56.237.68 sport=80 dport=44320 mark=0 use=1 ICMP echo replies were also masqueraded, but the reply was ignored. Weird observation 1: ip route del default ip add default via 10.0.0.17 Resulted in working masquerading, this time over device vpn0, which is a tuntap-interface. Working means that outgoing packets were correctly re-written with source 10.0.0.5 (local address of vpn0) and replie were correctly un-translated. Weird obervation 2: Some sites could be connected to with TCP. It turned out that those sites did not support TCP SACK. Indeed, turning off SACK either on the remote side of a connection or on the origonator side resulted in workign masquerading: eth0: 19:23:29.928470 IP 10.0.0.1.45611 129.13.162.95.80: S 4113365634:4113365634(0) win 5840 mss 1460 19:23:29.942246 IP 129.13.162.95.80 10.0.0.1.45611: S 4161877683:4161877683(0) ack 4113365635 win 5840 mss 1460 19:23:29.942313 IP 10.0.0.1.45611 129.13.162.95.80: . ack 1 win 5840 inet: 19:23:29.928249 IP 84.56.237.68.45611 129.13.162.95.80: S 4113365634:4113365634(0) win 5840 mss 1452 19:23:29.942199 IP 129.13.162.95.80 84.56.237.68.45611: S 4161877683:4161877683(0) ack 4113365635 win 5840 mss 1460 19:23:29.942332 IP 84.56.237.68.45611 129.13.162.95.80: . ack 1 win 5840 However, ICMP still is not masqueraded. Kernels that worked: 2.6.13-rc7, 2.6.12.5, 2.6.11 and lower, compiled for x86 with gcc-3.4 Kernels that don't work: 2.6.13-rc7 (compiled with gcc-3.4 and 4.0.2 debian), 2.6.13 (gcc-4.02) Kernel configuration was exactly the same for the 2.6.13-rc7 kernels, modulo the cpu and architectrue selections. I have a somewhat nontrivial source routing set-up on that machine that I could document more if that could be a possible reason for that problem. I am confident that this is not a configuration error, as the configuraiton worked basically unchanged since the 2.4 days, and I am confident it's not a iptables setup problem either, as I can reproduce it with empty rules except for the masquerading rule. I did not mention UDP because I didn't test it, but it's likely that UDP masquerading also fails. Any idea at what I could look at or try out to find out more about this problem? -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel BUG at "fs/exec.c":777
On Sun, Aug 21, 2005 at 01:49:45AM -0700, Andrew Morton <[EMAIL PROTECTED]> wrote: > Marc Lehmann <[EMAIL PROTECTED]> wrote: > > > > If wanted, I can probably reproduce > > that without the nvidia kernel module loaded. > > > > Yes, please do that, thanks. Ooops, you are not Alexander Nyberg :) Sorry, to give my previous reply more context: I had a conversation with Alexander Nyberg who wanted to debug this problem this weekend, and I gave detailed instructions on how to reproduce it (which is a bit awkward). I also wrote a script that doesn't rely on X running, but triggers the bug much less often (in fact, only twice for me so far), and then it seems only the first time after reboot (which *could* be caused by the very different timing of the stat()-threads due to the extra disk access). Let's see what Alexander found out (if he found time). The problem does not happen (or is not reproducible) with newer IO::AIO releases, as that one doesn't start threads in the child after the fork/before the exec. -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel BUG at "fs/exec.c":777
On Sun, Aug 21, 2005 at 01:49:45AM -0700, Andrew Morton <[EMAIL PROTECTED]> wrote: > Marc Lehmann <[EMAIL PROTECTED]> wrote: > > > > If wanted, I can probably reproduce > > that without the nvidia kernel module loaded. > > > > Yes, please do that, thanks. I tried a few times with booting into textmode (the X-server loads the nvidia module) and running the oops script, and after the third try, I get the oops again, but not afterwards (I kept running it on the same machine). -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel BUG at fs/exec.c:777
On Sun, Aug 21, 2005 at 01:49:45AM -0700, Andrew Morton [EMAIL PROTECTED] wrote: Marc Lehmann [EMAIL PROTECTED] wrote: If wanted, I can probably reproduce that without the nvidia kernel module loaded. Yes, please do that, thanks. I tried a few times with booting into textmode (the X-server loads the nvidia module) and running the oops script, and after the third try, I get the oops again, but not afterwards (I kept running it on the same machine). -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel BUG at fs/exec.c:777
On Sun, Aug 21, 2005 at 01:49:45AM -0700, Andrew Morton [EMAIL PROTECTED] wrote: Marc Lehmann [EMAIL PROTECTED] wrote: If wanted, I can probably reproduce that without the nvidia kernel module loaded. Yes, please do that, thanks. Ooops, you are not Alexander Nyberg :) Sorry, to give my previous reply more context: I had a conversation with Alexander Nyberg who wanted to debug this problem this weekend, and I gave detailed instructions on how to reproduce it (which is a bit awkward). I also wrote a script that doesn't rely on X running, but triggers the bug much less often (in fact, only twice for me so far), and then it seems only the first time after reboot (which *could* be caused by the very different timing of the stat()-threads due to the extra disk access). Let's see what Alexander found out (if he found time). The problem does not happen (or is not reproducible) with newer IO::AIO releases, as that one doesn't start threads in the child after the fork/before the exec. -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel BUG at "fs/exec.c":777
(A courteasy CC: on replies would be appreciated, thanks) Hi! I get the above oops message (full details below) sometimes when running the CVS version of "cv", a gtk+ image viewer. I use kernel 2.6.12.5, but it occured on 2.6.11 that I ran earlier, too. Unfortunately, it only happens during interactive use (or at least my simple test scripts were unable to reproduce the behaviour yet). cv is a perl/gtk script that uses the IO::AIO module. That module starts a number of threads that basically emulate asynchronous I/O. The oops happen when cv starts to move files (which it does by fork+exec of /bin/mv, which is where it oopses). IO::AIO has an pthread_atfork handler that recreates the aio threads after the fork (but doesn't kill the threads before the fork). The forked process than does an exec() and rarely oopses. So what happens is: pthread_create (4 or more times) fork ("main" thread forks) pthread_create (4 or more times) exec ("main" thread execs) ...(very rarely oopses) All in quick successsion. fs/exec.c:777 is: 593 static inline int de_thread(struct task_struct *tsk) ... 776 if (!thread_group_empty(current)) 777 BUG(); 778 if (!thread_group_leader(current)) 779 BUG(); 780 return 0; 2.6.11 oopsed at the same BUG(). The system is an SMP dual opteron in 64 bit mode with gcc-3.3 (I think) compiled kernel and the nvidia kernel module loaded (but the program only does X calls, no direct gl access). If wanted, I can probably reproduce that without the nvidia kernel module loaded. If any other info is required to fix that bug I'll happily try to find out or test things. Thanks! The complete OOPS is: --- [cut here ] - [please bite here ] - Kernel BUG at "fs/exec.c":777 invalid operand: [1] SMP CPU 0 Modules linked in: nls_utf8 nls_cp850 vfat fat loop nvidia tg3 snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_seq_midi snd_seq_midi_event snd_seq snd_emu10k1 snd_seq_device snd_util_mem snd_hwdep w83627hf i2c_sensor i2c_isa amd64_agp 3w_9xxx Pid: 11032, comm: cv Tainted: P 2.6.12.5 RIP: 0010:[] {flush_old_exec+1531} RSP: 0018:810002cddd28 EFLAGS: 00010202 RAX: 81001d064a90 RBX: 0001 RCX: RDX: 81001d064910 RSI: 81003e501680 RDI: 81003fec4e80 RBP: 81002555ccc0 R08: 805b1880 R09: 0002 R10: R11: 810001e104e0 R12: ffb0 R13: 81002577a8c0 R14: 81002577b0c8 R15: 81003e501680 FS: 2b1a7e10() GS:80576780() knlGS:56a06500 CS: 0010 DS: ES: CR0: 8005003b CR2: 03563600 CR3: 15335000 CR4: 06e0 Process cv (pid: 11032, threadinfo 810002cdc000, task 81001d064910) Stack: 00010101 801a2eee 0080 81001295e480 0080 81002dfbd600 8100 81002dfbd600 1295e480 81000a4c9b40 Call Trace:{dnotify_parent+46} {load_elf_binary+1335} {buffered_rmqueue+323} {load_elf_binary+0} {search_binary_handler+158} {do_execve+386} {system_call+126} {sys_execve+65} {stub_execve+106} Code: 0f 0b 05 ec 42 80 ff ff ff ff 09 03 65 48 8b 04 25 00 00 00 RIP {flush_old_exec+1531} RSP nfs warning: mount version older than kernel -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel BUG at fs/exec.c:777
(A courteasy CC: on replies would be appreciated, thanks) Hi! I get the above oops message (full details below) sometimes when running the CVS version of cv, a gtk+ image viewer. I use kernel 2.6.12.5, but it occured on 2.6.11 that I ran earlier, too. Unfortunately, it only happens during interactive use (or at least my simple test scripts were unable to reproduce the behaviour yet). cv is a perl/gtk script that uses the IO::AIO module. That module starts a number of threads that basically emulate asynchronous I/O. The oops happen when cv starts to move files (which it does by fork+exec of /bin/mv, which is where it oopses). IO::AIO has an pthread_atfork handler that recreates the aio threads after the fork (but doesn't kill the threads before the fork). The forked process than does an exec() and rarely oopses. So what happens is: pthread_create (4 or more times) fork (main thread forks) pthread_create (4 or more times) exec (main thread execs) ...(very rarely oopses) All in quick successsion. fs/exec.c:777 is: 593 static inline int de_thread(struct task_struct *tsk) ... 776 if (!thread_group_empty(current)) 777 BUG(); 778 if (!thread_group_leader(current)) 779 BUG(); 780 return 0; 2.6.11 oopsed at the same BUG(). The system is an SMP dual opteron in 64 bit mode with gcc-3.3 (I think) compiled kernel and the nvidia kernel module loaded (but the program only does X calls, no direct gl access). If wanted, I can probably reproduce that without the nvidia kernel module loaded. If any other info is required to fix that bug I'll happily try to find out or test things. Thanks! The complete OOPS is: --- [cut here ] - [please bite here ] - Kernel BUG at fs/exec.c:777 invalid operand: [1] SMP CPU 0 Modules linked in: nls_utf8 nls_cp850 vfat fat loop nvidia tg3 snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_seq_midi snd_seq_midi_event snd_seq snd_emu10k1 snd_seq_device snd_util_mem snd_hwdep w83627hf i2c_sensor i2c_isa amd64_agp 3w_9xxx Pid: 11032, comm: cv Tainted: P 2.6.12.5 RIP: 0010:[8017e47b] 8017e47b{flush_old_exec+1531} RSP: 0018:810002cddd28 EFLAGS: 00010202 RAX: 81001d064a90 RBX: 0001 RCX: RDX: 81001d064910 RSI: 81003e501680 RDI: 81003fec4e80 RBP: 81002555ccc0 R08: 805b1880 R09: 0002 R10: R11: 810001e104e0 R12: ffb0 R13: 81002577a8c0 R14: 81002577b0c8 R15: 81003e501680 FS: 2b1a7e10() GS:80576780() knlGS:56a06500 CS: 0010 DS: ES: CR0: 8005003b CR2: 03563600 CR3: 15335000 CR4: 06e0 Process cv (pid: 11032, threadinfo 810002cdc000, task 81001d064910) Stack: 00010101 801a2eee 0080 81001295e480 0080 81002dfbd600 8100 81002dfbd600 1295e480 81000a4c9b40 Call Trace:801a2eee{dnotify_parent+46} 8019f4a7{load_elf_binary+1335} 80157fd3{buffered_rmqueue+323} 8019ef70{load_elf_binary+0} 8017ea3e{search_binary_handler+158} 8017ed82{do_execve+386} 8010e72a{system_call+126} 8010d181{sys_execve+65} 8010eb4a{stub_execve+106} Code: 0f 0b 05 ec 42 80 ff ff ff ff 09 03 65 48 8b 04 25 00 00 00 RIP 8017e47b{flush_old_exec+1531} RSP 810002cddd28 nfs warning: mount version older than kernel -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: critical bugs in md raid5 and ATA disk failure/recovery modes
ation for the drive. What the drive in many failures is simply tag the block as unreadable (mostly because the checksum/ecc data does not match) and correct this on write. Most drivers will also check the surface and allocate a replacement block automatically if required. > of replacement blocks, and will eventually fail. That is why Then the drive would be very buggy. If it runs out of replacement blocks it will not suddenly fail, but only be unable to repair the block. > Linux "forces" early replacement of the disk on any error - it is the > safest thing to do. That is certainly untrue. The safest thing to do would doubtlessly be to make a warning that the disk needs to be replaced but still provide the data as long as possible, instead of killing the device. It would certainly make sense to no touch the disk in write mode, or, if one is paranoid, in read mode, but right now the device is simply lost. > > Of course, but that's supposed to be worked around by using a journaling > > file system, right? > > Nope, journaling is no magical fix for meta data corruption. Meta data corruption of what? The raid device, then yes, the filesystem, then no. raid5 works by relying on error detetcion of the underlying device. it will suffer form the same kind of corruption that a normal device suffers, i.e. if data gets corrupted silently it's gone. However, in other cases (loud error reporting), the raid device will not corrupt data, as it can always know which data is there and which isn't, juts as with a normal disk. What raid provides is just more redundant data in normal operation - it doens't suffer from silent data corruption more than a normal disk. -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: critical bugs in md raid5 and ATA disk failure/recovery modes
and allocate a replacement block automatically if required. of replacement blocks, and will eventually fail. That is why Then the drive would be very buggy. If it runs out of replacement blocks it will not suddenly fail, but only be unable to repair the block. Linux forces early replacement of the disk on any error - it is the safest thing to do. That is certainly untrue. The safest thing to do would doubtlessly be to make a warning that the disk needs to be replaced but still provide the data as long as possible, instead of killing the device. It would certainly make sense to no touch the disk in write mode, or, if one is paranoid, in read mode, but right now the device is simply lost. Of course, but that's supposed to be worked around by using a journaling file system, right? Nope, journaling is no magical fix for meta data corruption. Meta data corruption of what? The raid device, then yes, the filesystem, then no. raid5 works by relying on error detetcion of the underlying device. it will suffer form the same kind of corruption that a normal device suffers, i.e. if data gets corrupted silently it's gone. However, in other cases (loud error reporting), the raid device will not corrupt data, as it can always know which data is there and which isn't, juts as with a normal disk. What raid provides is just more redundant data in normal operation - it doens't suffer from silent data corruption more than a normal disk. -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: critical bugs in md raid5
On Thu, Jan 27, 2005 at 06:11:34AM +0100, Andi Kleen <[EMAIL PROTECTED]> wrote: > Marc Lehmann <[EMAIL PROTECTED]> writes: > > The summary seems to be that the linux raid driver only protects your data > > as long as all disks are fine and the machine never crashes. > > "as long as the machine never crashes". That's correct. If you think > about how RAID 5 works there is no way around it. When a write to I disagree. When not working in degraded mode, it's absolutely reasonable to e.g. use only the non-parity data. A crash with raid5 is in no way different to a crash without raid5 then: either the old data is on the disk, the new data is on the disk, or you had some catastrophic disk event and no data is on the disk. The case I reported was not a catastrophic failure: either the old or new data was on the disk, and the filesystem journaling (which is ext3) will take care of it. Even if the parity information is not in sync, either old or new data is on the disk. > a single stripe is interrupted (machine crash) and you lose a disk > during the recovery a lot of data (even unrelated to the data just written) > is lost. This is not what I described, in fact, I haven't lost any data, despite having had a number of such problems (I did verify that afterwards, and found no differences. Maybe this is luck, but it seems to happen in the majority of cases, and I ahd a similar problem at least 5 or 6 times because I didn't encounter the bug I reported). > But that's nothing inherent in Linux RAID5. It's a generic problem. > Pretty much all Software RAID5 implementations have it. Indeed, but I think linux' behaviour is especially poor. For example, the renumbering of the devices or the strange rebuild-restart behaviour (which is definitely a bug) will make recovery unnecessarily complicated. > RAID-1 helps a bit, because you either get the old or the new data, > but not some corruption. You don't get any magical corruption with RAID5 either... the data contents will either be old, or new. The differnce is that you cannot trust parity. > In practice even old data can be a big > problem though (e.g. when file system metadata is affected) Of course, but that's supposed to be worked around by using a journaling file system, right? > Morale: if you really care about your data backup very often and > use RAID-1 or get an expensive hardware RAID with battery backup > (all the cheap "hardware RAIDs" are equally useless for this) Yes, I am thinking of that for some time now, but always had a problem because the affordable ones have low performance. But given linux' effective slower-than-a-single-disk performance it shouldn't be hard to beat nowadays. There is, however, at least the resyncing with only 4 out of 5 disks, that is doubtlessly a bug somewhere. -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
critical bugs in md raid5
never read more than about 25-35MB/s top, which is much less than the speed of a single disk - dd'ing from a single disk gives a speed of >50MB/s, and dd'ing from, say, 4 or 5 disks gives me wlel over 200MB/s). Of course, this last issue is not critical at all - I am working with this problem since 2.4 days :) Thanks for all the good work that alraedy went into linux, though! Hope this helps, -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
critical bugs in md raid5
about 25-35MB/s top, which is much less than the speed of a single disk - dd'ing from a single disk gives a speed of 50MB/s, and dd'ing from, say, 4 or 5 disks gives me wlel over 200MB/s). Of course, this last issue is not critical at all - I am working with this problem since 2.4 days :) Thanks for all the good work that alraedy went into linux, though! Hope this helps, -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: critical bugs in md raid5
On Thu, Jan 27, 2005 at 06:11:34AM +0100, Andi Kleen [EMAIL PROTECTED] wrote: Marc Lehmann [EMAIL PROTECTED] writes: The summary seems to be that the linux raid driver only protects your data as long as all disks are fine and the machine never crashes. as long as the machine never crashes. That's correct. If you think about how RAID 5 works there is no way around it. When a write to I disagree. When not working in degraded mode, it's absolutely reasonable to e.g. use only the non-parity data. A crash with raid5 is in no way different to a crash without raid5 then: either the old data is on the disk, the new data is on the disk, or you had some catastrophic disk event and no data is on the disk. The case I reported was not a catastrophic failure: either the old or new data was on the disk, and the filesystem journaling (which is ext3) will take care of it. Even if the parity information is not in sync, either old or new data is on the disk. a single stripe is interrupted (machine crash) and you lose a disk during the recovery a lot of data (even unrelated to the data just written) is lost. This is not what I described, in fact, I haven't lost any data, despite having had a number of such problems (I did verify that afterwards, and found no differences. Maybe this is luck, but it seems to happen in the majority of cases, and I ahd a similar problem at least 5 or 6 times because I didn't encounter the bug I reported). But that's nothing inherent in Linux RAID5. It's a generic problem. Pretty much all Software RAID5 implementations have it. Indeed, but I think linux' behaviour is especially poor. For example, the renumbering of the devices or the strange rebuild-restart behaviour (which is definitely a bug) will make recovery unnecessarily complicated. RAID-1 helps a bit, because you either get the old or the new data, but not some corruption. You don't get any magical corruption with RAID5 either... the data contents will either be old, or new. The differnce is that you cannot trust parity. In practice even old data can be a big problem though (e.g. when file system metadata is affected) Of course, but that's supposed to be worked around by using a journaling file system, right? Morale: if you really care about your data backup very often and use RAID-1 or get an expensive hardware RAID with battery backup (all the cheap hardware RAIDs are equally useless for this) Yes, I am thinking of that for some time now, but always had a problem because the affordable ones have low performance. But given linux' effective slower-than-a-single-disk performance it shouldn't be hard to beat nowadays. There is, however, at least the resyncing with only 4 out of 5 disks, that is doubtlessly a bug somewhere. -- The choice of a -==- _GNU_ ==-- _ generation Marc Lehmann ---==---(_)__ __ __ [EMAIL PROTECTED] --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA's Southbridge bug: Latest (pseudo-)patch
On Sun, Jun 03, 2001 at 11:10:02PM +0100, Adrian Cox <[EMAIL PROTECTED]> wrote: > > data corruption was easily detectable, one couldn't even write 500megs > > without altered bytes). > > > Wrong way round. You're right that the pci master is supposed to handle > delayed transactions, but during data transfer the pdc is the pci master > and the northbridge is the PCI target. Ok, so it could be the promise controller (the controller, however, worked for a long time in another board with no via chipset and pci delayed transactions enabled, so I guess it is not only dependnet on the promise controller). and this means that there is no automatic workaround, since not all systems seem to have this problem. I *do* hate silent data corruption :() -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA's Southbridge bug: Latest (pseudo-)patch
On Sun, Jun 03, 2001 at 11:10:02PM +0100, Adrian Cox [EMAIL PROTECTED] wrote: data corruption was easily detectable, one couldn't even write 500megs without altered bytes). Wrong way round. You're right that the pci master is supposed to handle delayed transactions, but during data transfer the pdc is the pci master and the northbridge is the PCI target. Ok, so it could be the promise controller (the controller, however, worked for a long time in another board with no via chipset and pci delayed transactions enabled, so I guess it is not only dependnet on the promise controller). and this means that there is no automatic workaround, since not all systems seem to have this problem. I *do* hate silent data corruption :() -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA's Southbridge bug: Latest (pseudo-)patch
On Fri, Jun 01, 2001 at 11:28:48AM -0400, Jeff Garzik <[EMAIL PROTECTED]> wrote: > Once you get into the area of flushing data (or not flushing, which is > what delayed txn would imply), it is entirely possible that the driver > simply does not support what occurs when the PCI Delay Txn option is > set. Aren't PCI delayed transaction supposed to be handled by the pci master (e.g. my northbridge), not by the (software) driver for my pdc(?) I would also be surprised if my pdc actually used that feature, not to speak of the fact that the promise + harddisk worked fine in another computer (the data corruption was easily detectable, one couldn't even write 500megs without altered bytes). -- -==- | ==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA's Southbridge bug: Latest (pseudo-)patch
On Sat, May 19, 2001 at 11:07:21AM +0200, Axel Thimm <[EMAIL PROTECTED]> wrote: > if( KT133A || KT133 || KX133 ) { > if( Mainboard=="Epox 8KTA-3(+)" && BIOS>="8kt31417" ) > return 0; /* EPOX already fixed it their way. */ > #ifdef NEW_PATCH > Offset 76: Set bit5=0 and bit4=1 ("every PCI master grand") > #else /* this is already part of 2.4.4 */ > Offset 70: Set bit1=0 ("PCI Delay Transaction = 0") one thing I found out using triel and error is that setting "PCI Delay Transaction" to enabled causes data corruption on WRITE to my ide drives connected to an Promise Ultra 100 PCI controlelr (I didn't get any corruption on the devices connected to the via ide interface, presumably because my bios already had the right fix). So, while the every pci master grant setting apperently fixes the internal via ide interface corruption the PCI Delay Transaction option also must be buggy (or my promise controller is) and causes data corruption at least with an additional promise ultra 100. board: asus cuv4x-d (Apollo MVP3 AGP + via686b southbridge) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA's Southbridge bug: Latest (pseudo-)patch
On Sat, May 19, 2001 at 11:07:21AM +0200, Axel Thimm [EMAIL PROTECTED] wrote: if( KT133A || KT133 || KX133 ) { if( Mainboard==Epox 8KTA-3(+) BIOS=8kt31417 ) return 0; /* EPOX already fixed it their way. */ #ifdef NEW_PATCH Offset 76: Set bit5=0 and bit4=1 (every PCI master grand) #else /* this is already part of 2.4.4 */ Offset 70: Set bit1=0 (PCI Delay Transaction = 0) one thing I found out using triel and error is that setting PCI Delay Transaction to enabled causes data corruption on WRITE to my ide drives connected to an Promise Ultra 100 PCI controlelr (I didn't get any corruption on the devices connected to the via ide interface, presumably because my bios already had the right fix). So, while the every pci master grant setting apperently fixes the internal via ide interface corruption the PCI Delay Transaction option also must be buggy (or my promise controller is) and causes data corruption at least with an additional promise ultra 100. board: asus cuv4x-d (Apollo MVP3 AGP + via686b southbridge) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VIA's Southbridge bug: Latest (pseudo-)patch
On Fri, Jun 01, 2001 at 11:28:48AM -0400, Jeff Garzik [EMAIL PROTECTED] wrote: Once you get into the area of flushing data (or not flushing, which is what delayed txn would imply), it is entirely possible that the driver simply does not support what occurs when the PCI Delay Txn option is set. Aren't PCI delayed transaction supposed to be handled by the pci master (e.g. my northbridge), not by the (software) driver for my pdc(?) I would also be surprised if my pdc actually used that feature, not to speak of the fact that the promise + harddisk worked fine in another computer (the data corruption was easily detectable, one couldn't even write 500megs without altered bytes). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.2 + aic7xxx still broken
On Wed, Feb 28, 2001 at 02:07:30PM +0100, Igor Mozetic <[EMAIL PROTECTED]> wrote: > 2.4.2 + stock aic7xxx: > -- > ... > SCSI host 0 channel 0 reset (pid 0) timed out - trying harder interestingly, I have exactly the same problems when booting my smp kernel with either maxcpus=1, nosmp or the second cpu removed but NOT when the kernel boots with two cpus (it works *perfectly*) Unless macpus=! switches off apic (it doens't) this doesn't look like a IRAQ problem, as the bios has no idea of the maxcpus=! option. One thing that puzzles me is why the new driver looks for db_185.h in /usr/include/db, which seems to be a rather nonstandard position for that header (none my my slackware or home-grown boxes have that directory, all of them have the db_185.h file in /usr/include, which is the standard location I'd think since glibc-2.1 installed it there). -- -==- | ==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.2 + aic7xxx still broken
On Wed, Feb 28, 2001 at 02:07:30PM +0100, Igor Mozetic [EMAIL PROTECTED] wrote: 2.4.2 + stock aic7xxx: -- ... SCSI host 0 channel 0 reset (pid 0) timed out - trying harder interestingly, I have exactly the same problems when booting my smp kernel with either maxcpus=1, nosmp or the second cpu removed but NOT when the kernel boots with two cpus (it works *perfectly*) Unless macpus=! switches off apic (it doens't) this doesn't look like a IRAQ problem, as the bios has no idea of the maxcpus=! option. One thing that puzzles me is why the new driver looks for db_185.h in /usr/include/db, which seems to be a rather nonstandard position for that header (none my my slackware or home-grown boxes have that directory, all of them have the db_185.h file in /usr/include, which is the standard location I'd think since glibc-2.1 installed it there). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux swap freeze STILL in 2.4.x
On Mon, Feb 26, 2001 at 08:11:55AM +0100, Mike Galbraith <[EMAIL PROTECTED]> wrote: > Hmm.. I remember having this problem and it was a problem with strace. Well, I obviously strace'd it to find out why I get a memory fault without one (I would be happy if it worked without strace ;->) > Anyway, it works fine here with virgin 2.4.2, so it seems unlikely it's > a kernel problem. > 259 execve("/sbin/losetup", ["losetup", "/dev/loop0", "/dev/hda5"], [/* 47 vars >*/]) = 0 The -e switch is causing the memory fault and subsequent breakage: 743 open("/dev/hdd", O_RDWR) = 4 743 open("/dev/loop0", O_RDWR)= 5 743 mlockall(0x3, 0x804c272) = 0 743 ioctl(5, LOOP_SET_FD, 0x4)= -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_FD, 0x4)= 0 743 ioctl(5, LOOP_SET_STATUS, 0xb5d8) = -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_STATUS, 0xb5d8) = -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_STATUS, 0xb5d8) = -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_STATUS, 0xb5d8) = -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_STATUS 743 +++ killed by SIGSEGV +++ (which is a strange strace anyway...) However, I just need to wait until there is a new crypto patch (and, if not, I'll eventually have to hack it myself to gte my data. After all it's source... ...) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux swap freeze STILL in 2.4.x
On Mon, Feb 26, 2001 at 08:11:55AM +0100, Mike Galbraith [EMAIL PROTECTED] wrote: Hmm.. I remember having this problem and it was a problem with strace. Well, I obviously strace'd it to find out why I get a memory fault without one (I would be happy if it worked without strace ;-) Anyway, it works fine here with virgin 2.4.2, so it seems unlikely it's a kernel problem. 259 execve("/sbin/losetup", ["losetup", "/dev/loop0", "/dev/hda5"], [/* 47 vars */]) = 0 The -e switch is causing the memory fault and subsequent breakage: 743 open("/dev/hdd", O_RDWR) = 4 743 open("/dev/loop0", O_RDWR)= 5 743 mlockall(0x3, 0x804c272) = 0 743 ioctl(5, LOOP_SET_FD, 0x4)= -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_FD, 0x4)= 0 743 ioctl(5, LOOP_SET_STATUS, 0xb5d8) = -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_STATUS, 0xb5d8) = -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_STATUS, 0xb5d8) = -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_STATUS, 0xb5d8) = -1 ENOSYS (Function not implemented) 743 ioctl(5, LOOP_SET_STATUS unfinished ... 743 +++ killed by SIGSEGV +++ (which is a strange strace anyway...) However, I just need to wait until there is a new crypto patch (and, if not, I'll eventually have to hack it myself to gte my data. After all it's source... ...) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux swap freeze STILL in 2.4.x
Oh, and one last thing I forgot: loop devices. Since 2.4.1 (the first version I used) through 2.4.2 and 2.4.2ac3 I only get: cerebro:~# strace -f -o x losetup -e rc6 /dev/loop0 /dev/hdd Memory Fault And then no access to the loop device works anymore (clearly this is after the 2.4.0.something crypto-patch applied, so this is probably not a 2.4.2 issue anyway since there is no 2.4.2 crypto patch). Happy Hacking ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux swap freeze STILL in 2.4.x
On Sun, Feb 25, 2001 at 05:58:32PM +0100, Mike Galbraith <[EMAIL PROTECTED]> wrote: > Signal delivery during oomest does not work (last time I tested). > Andrea fixed this once.. long time ~problem. Hmm, here is soemthing that is new: Just now, the machine gets VERY very sluggish and swaps: total used free sharedbuffers cached Mem:255296 253708 1588 0 29808 183020 -/+ buffers/cache: 40880 214416 Swap:2 2 0 now, there is plenty of free memory (200megs!) but no spwapsace and the kernel keeps swapping. The only interesting processes here are: PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 112 ?S 0:00742 1366 38921 3460 1.3 /opt/mysql//libexec/mysqld --basedir=/opt/mysql/ --datadir=/var/mysql --user=root --pid- 205 ?S 2:28 12335 1444 27167 4294966180 6728.9 /usr/bin/X11/X :0 -audit 1 -auth /etc/cfg/Xauthority -a 2 -once -t 5 vt02 -defer 421 pts/13 TN 1:00804 707 31552 17444 6.8 /usr/bin/perl ./summarize 376 pts/10 R 7:07269 129 22614 1852 0.7 rsync -av . doom cerebro-root/. --delete when I SIGSTOP the summarize script (which uses mysql very intensively) the system starts to work again but the memory situation does not improve. The RSS size of X puzzles me a bit, but this was always the case under 2.4.2 and 2.4.2ac3 (and maybe before) and didn't cause a problem before. Another bug I found is that initializing md on the kernel commandline in the wrong order (first md1 then md0) keeps the kernel from mounting md0 as root-device. Another problem is that, when I "startraid /dev/md1" (a two-partition, striped raid without persistent superblock) I get strange errors in /var/log/kernel (if anybody asks I'll provide them) but it works fine when I sue md=x on the kernel commandline. It's not a configuration problem sicne I got the same strange probkems with the mdstart I used successfully under 2.1 and 2.2. Another nitpick is kernel-pcmcia: For some unexplainable reason, the kernel SWITCHES OFF POWER to the pcmcia slots BEFORE notifying apmd, which then tries to save important data and locks (not the machine, just the script) since the network is suddenly dead although interface etc.. all still exist. Under the pcmcia-cs package one could work around this bug by specifying do_apm=0 for the pcmcia_core module, which has no effect under 2.4. So I do keep asking me: does anybody actually use 2.4 on production machines? ;-> (Historically, it seesm that my machines tend to freeze easily because of sudden OOM and/or reiserfs ;) -- -==- | ==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux swap freeze STILL in 2.4.x
On Sun, Feb 25, 2001 at 05:58:32PM +0100, Mike Galbraith <[EMAIL PROTECTED]> wrote: > > Usually I swapon ./swap some 512MB swapfile, but today I forgot it. When the > > machine started to get sluggish I sent the process a -STOP signal. > > Signal delivery during oomest does not work (last time I tested). > Andrea fixed this once.. long time ~problem. Well, the signal delivery seemed to have worked fine - the machine was quite usable (it swapped a lot, but the system was never unusable for longer than a second or so). The problem started when I did the swapon. Well, it didn't start, the system just froze. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux swap freeze STILL in 2.4.x
It seems linux-2.4 still freezes on out-of-memory situations: I was using 2.4.2-ac3 SMP and had a fairly large background job that takes hundreds of megabytes of memory, much more than I have: Mem:255296 81836 173460 0 10324 30608 Swap:2 0 2 Usually I swapon ./swap some 512MB swapfile, but today I forgot it. When the machine started to get sluggish I sent the process a -STOP signal. Swap:2 2 0 O.k, (I had about 12MB of main memory free (in the +/- buffers line of free) and the machine was sluggish but workable for about five minutes. At the instant I did a swapon ./swap the machine froze hard (no sysrq, no ping etc...) I thought these complete freezes on OOM-situations had been fixed in 2.4.x? Do I have to watch out for andrea's fix-2.4-oom patches? ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux swap freeze STILL in 2.4.x
It seems linux-2.4 still freezes on out-of-memory situations: I was using 2.4.2-ac3 SMP and had a fairly large background job that takes hundreds of megabytes of memory, much more than I have: Mem:255296 81836 173460 0 10324 30608 Swap:2 0 2 Usually I swapon ./swap some 512MB swapfile, but today I forgot it. When the machine started to get sluggish I sent the process a -STOP signal. Swap:2 2 0 O.k, (I had about 12MB of main memory free (in the +/- buffers line of free) and the machine was sluggish but workable for about five minutes. At the instant I did a swapon ./swap the machine froze hard (no sysrq, no ping etc...) I thought these complete freezes on OOM-situations had been fixed in 2.4.x? Do I have to watch out for andrea's fix-2.4-oom patches? ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux swap freeze STILL in 2.4.x
On Sun, Feb 25, 2001 at 05:58:32PM +0100, Mike Galbraith [EMAIL PROTECTED] wrote: Usually I swapon ./swap some 512MB swapfile, but today I forgot it. When the machine started to get sluggish I sent the process a -STOP signal. Signal delivery during oomest does not work (last time I tested). Andrea fixed this once.. long time ~problem. Well, the signal delivery seemed to have worked fine - the machine was quite usable (it swapped a lot, but the system was never unusable for longer than a second or so). The problem started when I did the swapon. Well, it didn't start, the system just froze. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux swap freeze STILL in 2.4.x
On Sun, Feb 25, 2001 at 05:58:32PM +0100, Mike Galbraith [EMAIL PROTECTED] wrote: Signal delivery during oomest does not work (last time I tested). Andrea fixed this once.. long time ~problem. Hmm, here is soemthing that is new: Just now, the machine gets VERY very sluggish and swaps: total used free sharedbuffers cached Mem:255296 253708 1588 0 29808 183020 -/+ buffers/cache: 40880 214416 Swap:2 2 0 now, there is plenty of free memory (200megs!) but no spwapsace and the kernel keeps swapping. The only interesting processes here are: PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 112 ?S 0:00742 1366 38921 3460 1.3 /opt/mysql//libexec/mysqld --basedir=/opt/mysql/ --datadir=/var/mysql --user=root --pid- 205 ?S 2:28 12335 1444 27167 4294966180 6728.9 /usr/bin/X11/X :0 -audit 1 -auth /etc/cfg/Xauthority -a 2 -once -t 5 vt02 -defer 421 pts/13 TN 1:00804 707 31552 17444 6.8 /usr/bin/perl ./summarize 376 pts/10 R 7:07269 129 22614 1852 0.7 rsync -av . doom cerebro-root/. --delete when I SIGSTOP the summarize script (which uses mysql very intensively) the system starts to work again but the memory situation does not improve. The RSS size of X puzzles me a bit, but this was always the case under 2.4.2 and 2.4.2ac3 (and maybe before) and didn't cause a problem before. Another bug I found is that initializing md on the kernel commandline in the wrong order (first md1 then md0) keeps the kernel from mounting md0 as root-device. Another problem is that, when I "startraid /dev/md1" (a two-partition, striped raid without persistent superblock) I get strange errors in /var/log/kernel (if anybody asks I'll provide them) but it works fine when I sue md=x on the kernel commandline. It's not a configuration problem sicne I got the same strange probkems with the mdstart I used successfully under 2.1 and 2.2. Another nitpick is kernel-pcmcia: For some unexplainable reason, the kernel SWITCHES OFF POWER to the pcmcia slots BEFORE notifying apmd, which then tries to save important data and locks (not the machine, just the script) since the network is suddenly dead although interface etc.. all still exist. Under the pcmcia-cs package one could work around this bug by specifying do_apm=0 for the pcmcia_core module, which has no effect under 2.4. So I do keep asking me: does anybody actually use 2.4 on production machines? ;- (Historically, it seesm that my machines tend to freeze easily because of sudden OOM and/or reiserfs ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux swap freeze STILL in 2.4.x
Oh, and one last thing I forgot: loop devices. Since 2.4.1 (the first version I used) through 2.4.2 and 2.4.2ac3 I only get: cerebro:~# strace -f -o x losetup -e rc6 /dev/loop0 /dev/hdd Memory Fault And then no access to the loop device works anymore (clearly this is after the 2.4.0.something crypto-patch applied, so this is probably not a 2.4.2 issue anyway since there is no 2.4.2 crypto patch). Happy Hacking ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
major security bug in reiserfs (may affect SuSE Linux)
We are still investigating, but there seems to be a major security problem in at least some versions of reiserfs. Since reiserfs is shipped with newer versions of SuSE Linux and the problem is too easy to reproduce and VERY dangerous I think alerting people to this problem is in order. We have tested and verified this problem on a number of different systems and kernels 2.2.17/2.2.8 with reiserfs-3.5.28 and probably other versions. Basically, you do: mkdir "$(perl -e 'print "x" x 768')" I.e. create a very long directory. The name doesn't seem to be of relevance (we found this out by doing mkdir "$(cat /etc/hosts)" for other tests). This works. The next ls (or echo *) command will segfault and the kernel oopses. all following accesses to the volume in question will oops and hang the process, even afetr a reboot. reiserfsck (the filesystem check program) does _NOT_ detect or solve this problem: Replaying journal..ok Checking S+tree..ok Comparing bitmaps..ok But fortunately, rmdir works and seems to leave the filesystem undamaged. Since a kernel oops results (see below), this indicates a buffer overrun (the kernel jumps to address 78787878, which is "") inside the kernel, which is of course very nasty (think ftp-upload!) and certainly gives you root access from anywhere, even from inside a chrooted environment. We didn't pursue this further. The best workaround at this time seems to be to uninstall reiserfs completely or not allow any user access (even indirect) to these volumes. While this individual bug might be easy to fix, we believe that other, similar bugs should be easy to find so reiserfs should not be trusted (it shouldn't be trusted to full user access for other reasons anyway, but it is still widely used). Unable to handle kernel paging request at virtual address 78787878 current->tss.cr3 = 0d074000, %cr3 = 0d074000 *pde = Oops: 0002 CPU:0 EIP:0010:[] EFLAGS: 00010282 eax: ebx: bfffe78c ecx: edx: bfffe78c esi: ccbddd62 edi: 78787878 ebp: 0300 esp: ccbddd3c ds: 0018 es: 0018 ss: 0018 Process bash (pid: 292, process nr: 54, stackpage=ccbdd000) Stack: c013f66a ccbddf6c cd10 ccbddd62 030c c0136d49 0700 2013 1000 7878030c 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 Call Trace: [] [] Code: 89 1f 8b 44 24 18 29 47 08 31 c0 5b 5e 5f 5d 81 c4 2c 01 00 -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
major security bug in reiserfs (may affect SuSE Linux)
We are still investigating, but there seems to be a major security problem in at least some versions of reiserfs. Since reiserfs is shipped with newer versions of SuSE Linux and the problem is too easy to reproduce and VERY dangerous I think alerting people to this problem is in order. We have tested and verified this problem on a number of different systems and kernels 2.2.17/2.2.8 with reiserfs-3.5.28 and probably other versions. Basically, you do: mkdir "$(perl -e 'print "x" x 768')" I.e. create a very long directory. The name doesn't seem to be of relevance (we found this out by doing mkdir "$(cat /etc/hosts)" for other tests). This works. The next ls (or echo *) command will segfault and the kernel oopses. all following accesses to the volume in question will oops and hang the process, even afetr a reboot. reiserfsck (the filesystem check program) does _NOT_ detect or solve this problem: Replaying journal..ok Checking S+tree..ok Comparing bitmaps..ok But fortunately, rmdir filename works and seems to leave the filesystem undamaged. Since a kernel oops results (see below), this indicates a buffer overrun (the kernel jumps to address 78787878, which is "") inside the kernel, which is of course very nasty (think ftp-upload!) and certainly gives you root access from anywhere, even from inside a chrooted environment. We didn't pursue this further. The best workaround at this time seems to be to uninstall reiserfs completely or not allow any user access (even indirect) to these volumes. While this individual bug might be easy to fix, we believe that other, similar bugs should be easy to find so reiserfs should not be trusted (it shouldn't be trusted to full user access for other reasons anyway, but it is still widely used). Unable to handle kernel paging request at virtual address 78787878 current-tss.cr3 = 0d074000, %cr3 = 0d074000 *pde = Oops: 0002 CPU:0 EIP:0010:[c013f875] EFLAGS: 00010282 eax: ebx: bfffe78c ecx: edx: bfffe78c esi: ccbddd62 edi: 78787878 ebp: 0300 esp: ccbddd3c ds: 0018 es: 0018 ss: 0018 Process bash (pid: 292, process nr: 54, stackpage=ccbdd000) Stack: c013f66a ccbddf6c cd10 ccbddd62 030c c0136d49 0700 2013 1000 7878030c 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 78787878 Call Trace: [c013f66a] [c0136d49] Code: 89 1f 8b 44 24 18 29 47 08 31 c0 5b 5e 5f 5d 81 c4 2c 01 00 -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
On Tue, Jan 09, 2001 at 02:55:15AM +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > > [wakko@:/home/wakko/test] rmdir "`pwd`" > > rmdir: /home/wakko/test: Invalid argument > > Some other OS with a yet different retval? :) It can be much worse (irix-6.5.4): bash# mkdir x; cd x; rmdir "`pwd`" /x: Can't remove current directory or .. Here the error message makes sense - but is totally wron in this case :( And here is linux-2.2.18: cerebro:~# mkdir x; cd x;rmdir "`pwd`" cerebro:~/x# ls -la total 6 drwxr-x--- 0 root root 35 Jan 9 05:54 . drwx-- 69 root root 5372 Jan 9 05:54 .. cerebro:~/x# cd cerebro:~# ls -la x ls: x: No such file or directory So, no, linux certainly does NOT remove "." ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: ramfs problem... (unlink of sparse file in "D" state)
On Mon, Jan 08, 2001 at 01:33:50PM -0500, Alexander Viro <[EMAIL PROTECTED]> wrote: > And prefix would be what? "/"? Besides, I said that you don't have > read permissions on /foo, not search ones. You do not need read permissions on /foo to make pathconf on it. This makes sense: you are not reading the directory... -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: ramfs problem... (unlink of sparse file in D state)
On Mon, Jan 08, 2001 at 01:33:50PM -0500, Alexander Viro [EMAIL PROTECTED] wrote: And prefix would be what? "/"? Besides, I said that you don't have read permissions on /foo, not search ones. You do not need read permissions on /foo to make pathconf on it. This makes sense: you are not reading the directory... -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
On Tue, Jan 09, 2001 at 02:55:15AM +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: [wakko@removed:/home/wakko/test] rmdir "`pwd`" rmdir: /home/wakko/test: Invalid argument Some other OS with a yet different retval? :) It can be much worse (irix-6.5.4): bash# mkdir x; cd x; rmdir "`pwd`" /x: Can't remove current directory or .. Here the error message makes sense - but is totally wron in this case :( And here is linux-2.2.18: cerebro:~# mkdir x; cd x;rmdir "`pwd`" cerebro:~/x# ls -la total 6 drwxr-x--- 0 root root 35 Jan 9 05:54 . drwx-- 69 root root 5372 Jan 9 05:54 .. cerebro:~/x# cd cerebro:~# ls -la x ls: x: No such file or directory So, no, linux certainly does NOT remove "." ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Journaling: Surviving or allowing unclean shutdown?
On Sat, Jan 06, 2001 at 03:35:02PM -0500, Chris Mason <[EMAIL PROTECTED]> wrote: > > Nobody with working brain would read it completely into memory. Instead everybody with a working brain would introduce another hashing layer for every block access? I don't think the reiserfs code (e.g.) would cope with yte another compliation in the code ;) -- -==- | ==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Journaling: Surviving or allowing unclean shutdown?
On Fri, Jan 05, 2001 at 11:58:56AM +, David Woodhouse <[EMAIL PROTECTED]> wrote: > You mount it read-only, recover as much as possible from it, and bin it. > > You _don't_ want the fs code to ignore your explicit instructions not to > write to the medium, and to destroy whatever data were left. The problem is: where did you give the explicit instruction? Just that you define "read-only" as "the medium should not be written" does not mean everybody else thinks the same. actually, I regard "ro" mainly as a "hey kernel, I won't handle writes now, so please don't try it", like for cd-roms or other non-writeale media, and please filesystem stay in a clean state. That ro means "the medium is never written" is an assumption that does not hold for most disks anyway and is, in the case of journlaing filesystems, often impossible to implement. You simply can't salvage data without a log reply. Sure, you can do virtual log replays, but for example the reiserfs log is currently 32mb. Pinning down that much memory for a virtual log reply is not possible on low-memory machines. So the first thing would be to precisely define the meaning of the "ro" flag. Before this has happened it is ansolutely senseless to argue about what it means, as it doesn't mean anything at the moment, except (man mount): ro Mount the file system read-only. Which it does even with journaling filesystems... -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Journaling: Surviving or allowing unclean shutdown?
On Fri, Jan 05, 2001 at 11:58:56AM +, David Woodhouse [EMAIL PROTECTED] wrote: You mount it read-only, recover as much as possible from it, and bin it. You _don't_ want the fs code to ignore your explicit instructions not to write to the medium, and to destroy whatever data were left. The problem is: where did you give the explicit instruction? Just that you define "read-only" as "the medium should not be written" does not mean everybody else thinks the same. actually, I regard "ro" mainly as a "hey kernel, I won't handle writes now, so please don't try it", like for cd-roms or other non-writeale media, and please filesystem stay in a clean state. That ro means "the medium is never written" is an assumption that does not hold for most disks anyway and is, in the case of journlaing filesystems, often impossible to implement. You simply can't salvage data without a log reply. Sure, you can do virtual log replays, but for example the reiserfs log is currently 32mb. Pinning down that much memory for a virtual log reply is not possible on low-memory machines. So the first thing would be to precisely define the meaning of the "ro" flag. Before this has happened it is ansolutely senseless to argue about what it means, as it doesn't mean anything at the moment, except (man mount): ro Mount the file system read-only. Which it does even with journaling filesystems... -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Journaling: Surviving or allowing unclean shutdown?
On Sat, Jan 06, 2001 at 03:35:02PM -0500, Chris Mason [EMAIL PROTECTED] wrote: Nobody with working brain would read it completely into memory. Instead everybody with a working brain would introduce another hashing layer for every block access? I don't think the reiserfs code (e.g.) would cope with yte another compliation in the code ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
time function problems with 2.2.18 / hang
I have an error that occurs after upgrading from 2.2.18pre23 to 2.2.18 + vm-global-7 patch. Apart from enhanced stability in low-memory cases (hey, it doesn't freeze ten times a day ;), I have the problem that once every few days, preferably under high load, X behaves strangely (window manager shows no reaction, mouse works OR mousecursor stops moving OR wm works, mouse works but rxvt's stop working tc..) When this happens I can still log-in via the network and run command, but every copmmand that uses waits (select(0,0,0,xxx) or nanosleep) just hangs: cerebro:~# strace -f sleep 1 ... nanosleep({1, 0}, Also, when I beep the terminal it starts beeping but never stops, so it seems the timer system inside the kernel is somehow wrecked in this state. Doing while :;do kill -CONT -1;done lets me do some things, like runing top or kill and restart X (very slowly ;). That is the strangest thing I ever saw in a release kernel ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
time function problems with 2.2.18 / hang
I have an error that occurs after upgrading from 2.2.18pre23 to 2.2.18 + vm-global-7 patch. Apart from enhanced stability in low-memory cases (hey, it doesn't freeze ten times a day ;), I have the problem that once every few days, preferably under high load, X behaves strangely (window manager shows no reaction, mouse works OR mousecursor stops moving OR wm works, mouse works but rxvt's stop working tc..) When this happens I can still log-in via the network and run command, but every copmmand that uses waits (select(0,0,0,xxx) or nanosleep) just hangs: cerebro:~# strace -f sleep 1 ... nanosleep({1, 0}, Also, when I beep the terminal it starts beeping but never stops, so it seems the timer system inside the kernel is somehow wrecked in this state. Doing while :;do kill -CONT -1;done lets me do some things, like runing top or kill and restart X (very slowly ;). That is the strangest thing I ever saw in a release kernel ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: recursive exports && linux nfs
On Fri, Dec 15, 2000 at 11:54:46PM +0100, Pavel Machek <[EMAIL PROTECTED]> wrote: > > 2) using: I can do cd /nfs/fs, but the directoy is always empty, and when I > >try to step into a subdirectory I always get "No such file or directory". > > > > Thanks a lot for any insights, even if this means "this is not supported" > > ;) > > This can't be supported, afaict, because nfs handles have limited > size. Ehrm, did you really read my mail? Most people told me something like "recursive exports are not supported" (actually, they are and they work), and it seems nobody really read what I wrote :( My problem is that autofs doesn't work. Example: / reiserfs /fs autofs /fs/big ext2 When I exportfs /, /fs AND /fs/big then I can mount /fs on another box, but it is always empty, even if something (e.g. /fs/big) is mounted and can be accessed fine the whole time. Automounting doesn't work, either, of course. Another (less grave) problem is that exportfs (and/or rpc.nfsd) require network access and access to the volume, so they a) mount all automounted directories (VERY expensive) and require network access (making all clients NOT survive a reboot). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: recursive exports linux nfs
On Fri, Dec 15, 2000 at 11:54:46PM +0100, Pavel Machek [EMAIL PROTECTED] wrote: 2) using: I can do cd /nfs/fs, but the directoy is always empty, and when I try to step into a subdirectory I always get "No such file or directory". Thanks a lot for any insights, even if this means "this is not supported" ;) This can't be supported, afaict, because nfs handles have limited size. Ehrm, did you really read my mail? Most people told me something like "recursive exports are not supported" (actually, they are and they work), and it seems nobody really read what I wrote :( My problem is that autofs doesn't work. Example: / reiserfs /fs autofs /fs/big ext2 When I exportfs /, /fs AND /fs/big then I can mount /fs on another box, but it is always empty, even if something (e.g. /fs/big) is mounted and can be accessed fine the whole time. Automounting doesn't work, either, of course. Another (less grave) problem is that exportfs (and/or rpc.nfsd) require network access and access to the volume, so they a) mount all automounted directories (VERY expensive) and require network access (making all clients NOT survive a reboot). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
recursive exports && linux nfs
Hi ;) I am trying to export the whole filesystem hierarchy on one of my servers (this includes /fs, which is an automounted directory using autofs). Now I have two problems: 1) exporting: exportfs does not really exports filesystems that are not present when exportfs is being called (some of my filesystems are only available temporarily). Also, exportfs of course forces the mount of all filesystems that are mountable, which can take considerable time. 2) using: I can do cd /nfs/fs, but the directoy is always empty, and when I try to step into a subdirectory I always get "No such file or directory". I am using linux-2.2.18, nfsv3 + nfs-utils-0.2.1. Thanks a lot for any insights, even if this means "this is not supported" ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
recursive exports linux nfs
Hi ;) I am trying to export the whole filesystem hierarchy on one of my servers (this includes /fs, which is an automounted directory using autofs). Now I have two problems: 1) exporting: exportfs does not really exports filesystems that are not present when exportfs is being called (some of my filesystems are only available temporarily). Also, exportfs of course forces the mount of all filesystems that are mountable, which can take considerable time. 2) using: I can do cd /nfs/fs, but the directoy is always empty, and when I try to step into a subdirectory I always get "No such file or directory". I am using linux-2.2.18, nfsv3 + nfs-utils-0.2.1. Thanks a lot for any insights, even if this means "this is not supported" ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
reordering pci interrupts?
I have a motherboard with a broken bios that is unable to set interrupts correctly, i.e. it initializes the devices corerctly but swaps the interrupts for slot1/slot3 and slot2/slot4. Now, is there a way to forcefully re-order the pci-interrupts? I do not have an io-apic (thus no pirq=xxx), and I tried to poke the interrupt values directly into /proc/bus/pic/*/*, but the kernel has it's own idea. Thanks a lot for any info (I guess I'll just patch the kernel). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
reordering pci interrupts?
I have a motherboard with a broken bios that is unable to set interrupts correctly, i.e. it initializes the devices corerctly but swaps the interrupts for slot1/slot3 and slot2/slot4. Now, is there a way to forcefully re-order the pci-interrupts? I do not have an io-apic (thus no pirq=xxx), and I tried to poke the interrupt values directly into /proc/bus/pic/*/*, but the kernel has it's own idea. Thanks a lot for any info (I guess I'll just patch the kernel). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
routing problems with 2.2
The Problem: the command "telnet 212.172.23.17 80", done from a machine outside my network generates syn requests on the device tun2 on my machine (a tunnel device using vtun). tcpdump tun2: 00:04:55.066516 12.4.218.41.4624 > 212.172.23.17.80: S 219810852:219810852(0) win 16384 (DF) [tos 0x10] 00:04:55.119757 129.13.162.254 > 212.172.23.17: icmp: host 12.4.218.41 unreachable - admin prohibited filter (the second packet is due to the misrouting of the return packet on the interface tun1, which hits some firewall): 00:04:55.066779 212.172.23.17.80 > 12.4.218.41.4624: S 437426418:437426418(0) ack 219810853 win 15510 (DF) 00:04:58.100986 212.172.23.17.80 > 12.4.218.41.4624: S 437426418:437426418(0) ack 219810853 win 15510 (DF) The problem is that everything works fine at first, but after some time after starting the network tunnels (between 5 minutes and a few days!) packets received on one interface get sound on another one, generally the wrong one. ifconfig down/up of the device usually works (it happens between tun1/tun2, tun2/ippp0 and even ippp0 and eth1, for example). Does anybody have an idea what's going wrong here, and how to fix this? Thanks a lot in advance, I'd be happy to provide more info. My config: linux-2.2.17 with most advanced router functions enabled (I can send my .config if neccessary). doom:~# ip rule list 0: from all lookup local 32766: from all lookup main 32767: from all lookup default doom:~# ip route list table local local 10.0.0.5 dev eth0 proto kernel scope host src 10.0.0.5 local 10.0.0.5 dev eth1 proto kernel scope host src 10.0.0.5 broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 broadcast 193.0.0.0 dev ippp0 proto kernel scope link src 62.224.169.116 local 62.224.169.116 dev ippp0 proto kernel scope host src 62.224.169.116 broadcast 10.255.255.255 dev eth0 proto kernel scope link src 10.0.0.5 broadcast 10.255.255.255 dev eth1 proto kernel scope link src 10.0.0.5 broadcast 193.255.255.255 dev ippp0 proto kernel scope link src 62.224.169.116 broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 local 129.13.162.92 dev tun1 proto kernel scope host src 129.13.162.92 local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 doom:~# ip route list table main 192.168.255.202 dev tun1 proto kernel scope link src 129.13.162.92 10.0.0.1 dev eth0 scope link 212.172.23.18 via 10.0.0.1 dev eth0 192.168.254.1 dev tun2 proto kernel scope link src 212.172.23.17 10.0.0.2 dev eth1 scope link 129.13.162.8 dev ippp0 scope link 10.0.0.9 dev eth1 scope link 129.13.162.93 via 10.0.0.1 dev eth0 172.16.0.0/12 dev tun1 scope link 193.0.0.0/8 dev ippp0 proto kernel scope link src 62.224.169.116 default dev ippp0 scope link default via 193.158.133.205 dev ippp0 doom:~# ip route list table default [empty] doom:~# ip link list 1: lo: mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ippp0: mtu 1500 qdisc pfifo_fast qlen 30 link/ppp 3: eth0: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:7d:03:38:73 brd ff:ff:ff:ff:ff:ff 4: eth1: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:7d:03:38:68 brd ff:ff:ff:ff:ff:ff 29: tun1: mtu 1450 qdisc pfifo_fast qlen 10 link/ppp 30: tun2: mtu 1450 qdisc pfifo_fast qlen 10 link/ppp doom:~# ip address list 1: lo: mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo 2: ippp0: mtu 1500 qdisc pfifo_fast qlen 30 link/ppp inet 62.224.169.116 peer 193.158.133.205/8 scope global ippp0 3: eth0: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:7d:03:38:73 brd ff:ff:ff:ff:ff:ff inet 10.0.0.5/32 brd 10.255.255.255 scope global eth0 4: eth1: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:7d:03:38:68 brd ff:ff:ff:ff:ff:ff inet 10.0.0.5/32 brd 10.255.255.255 scope global eth1 29: tun1: mtu 1450 qdisc pfifo_fast qlen 10 link/ppp inet 129.13.162.92 peer 192.168.255.202/32 scope global tun1 30: tun2: mtu 1450 qdisc pfifo_fast qlen 10 link/ppp inet 212.172.23.17 peer 192.168.254.1/32 scope global tun2 inet 212.172.23.21/32 scope global tun2 -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
routing problems with 2.2
The Problem: the command "telnet 212.172.23.17 80", done from a machine outside my network generates syn requests on the device tun2 on my machine (a tunnel device using vtun). tcpdump tun2: 00:04:55.066516 12.4.218.41.4624 212.172.23.17.80: S 219810852:219810852(0) win 16384 mss 1460,nop,wscale 0,nop,nop,timestamp[|tcp] (DF) [tos 0x10] 00:04:55.119757 129.13.162.254 212.172.23.17: icmp: host 12.4.218.41 unreachable - admin prohibited filter (the second packet is due to the misrouting of the return packet on the interface tun1, which hits some firewall): 00:04:55.066779 212.172.23.17.80 12.4.218.41.4624: S 437426418:437426418(0) ack 219810853 win 15510 mss 1410,nop,nop,timestamp 7186830[|tcp] (DF) 00:04:58.100986 212.172.23.17.80 12.4.218.41.4624: S 437426418:437426418(0) ack 219810853 win 15510 mss 1410,nop,nop,timestamp 7187134[|tcp] (DF) The problem is that everything works fine at first, but after some time after starting the network tunnels (between 5 minutes and a few days!) packets received on one interface get sound on another one, generally the wrong one. ifconfig down/up of the device usually works (it happens between tun1/tun2, tun2/ippp0 and even ippp0 and eth1, for example). Does anybody have an idea what's going wrong here, and how to fix this? Thanks a lot in advance, I'd be happy to provide more info. My config: linux-2.2.17 with most advanced router functions enabled (I can send my .config if neccessary). doom:~# ip rule list 0: from all lookup local 32766: from all lookup main 32767: from all lookup default doom:~# ip route list table local local 10.0.0.5 dev eth0 proto kernel scope host src 10.0.0.5 local 10.0.0.5 dev eth1 proto kernel scope host src 10.0.0.5 broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 broadcast 193.0.0.0 dev ippp0 proto kernel scope link src 62.224.169.116 local 62.224.169.116 dev ippp0 proto kernel scope host src 62.224.169.116 broadcast 10.255.255.255 dev eth0 proto kernel scope link src 10.0.0.5 broadcast 10.255.255.255 dev eth1 proto kernel scope link src 10.0.0.5 broadcast 193.255.255.255 dev ippp0 proto kernel scope link src 62.224.169.116 broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 local 129.13.162.92 dev tun1 proto kernel scope host src 129.13.162.92 local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 doom:~# ip route list table main 192.168.255.202 dev tun1 proto kernel scope link src 129.13.162.92 10.0.0.1 dev eth0 scope link 212.172.23.18 via 10.0.0.1 dev eth0 192.168.254.1 dev tun2 proto kernel scope link src 212.172.23.17 10.0.0.2 dev eth1 scope link 129.13.162.8 dev ippp0 scope link 10.0.0.9 dev eth1 scope link 129.13.162.93 via 10.0.0.1 dev eth0 172.16.0.0/12 dev tun1 scope link 193.0.0.0/8 dev ippp0 proto kernel scope link src 62.224.169.116 default dev ippp0 scope link default via 193.158.133.205 dev ippp0 doom:~# ip route list table default [empty] doom:~# ip link list 1: lo: LOOPBACK,UP mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ippp0: POINTOPOINT,NOARP,UP mtu 1500 qdisc pfifo_fast qlen 30 link/ppp 3: eth0: BROADCAST,MULTICAST,UP mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:7d:03:38:73 brd ff:ff:ff:ff:ff:ff 4: eth1: BROADCAST,MULTICAST,UP mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:7d:03:38:68 brd ff:ff:ff:ff:ff:ff 29: tun1: POINTOPOINT,MULTICAST,NOARP,UP mtu 1450 qdisc pfifo_fast qlen 10 link/ppp 30: tun2: POINTOPOINT,MULTICAST,NOARP,UP mtu 1450 qdisc pfifo_fast qlen 10 link/ppp doom:~# ip address list 1: lo: LOOPBACK,UP mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo 2: ippp0: POINTOPOINT,NOARP,UP mtu 1500 qdisc pfifo_fast qlen 30 link/ppp inet 62.224.169.116 peer 193.158.133.205/8 scope global ippp0 3: eth0: BROADCAST,MULTICAST,UP mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:7d:03:38:73 brd ff:ff:ff:ff:ff:ff inet 10.0.0.5/32 brd 10.255.255.255 scope global eth0 4: eth1: BROADCAST,MULTICAST,UP mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:7d:03:38:68 brd ff:ff:ff:ff:ff:ff inet 10.0.0.5/32 brd 10.255.255.255 scope global eth1 29: tun1: POINTOPOINT,MULTICAST,NOARP,UP mtu 1450 qdisc pfifo_fast qlen 10 link/ppp inet 129.13.162.92 peer 192.168.255.202/32 scope global tun1 30: tun2: POINTOPOINT,MULTICAST,NOARP,UP mtu 1450 qdisc pfifo_fast qlen 10 link/ppp inet 212.172.23.17 peer 192.168.254.1/32 scope global tun2 inet 212.172.23.21/32 scope global tun2 -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PRO
Re: Dual XEON - >>SLOW<< on SMP
On Sun, Nov 12, 2000 at 11:22:02PM -0700, "Jeff V. Merkey" <[EMAIL PROTECTED]> wrote: > I can go and get the text from our discussion, and I distinctly remember > your answer to this question on PII and you said "lots". This was also a Well, my mail certainly contained the words "lot" (not "lots") and "PII", but certainly not in the same sentence and certainly not refering to each other and certainly not in refering to syscalls, and I am totally puzzled of why you are keep claiming this in public (you can't even quote my name correctly). Could you please stop lying and hopefully apologize for abusing my name in public for claiming wrong things I never said and abstain from doing so in the future? And please keep this off-list from now on. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Dual XEON - SLOW on SMP
On Sun, Nov 12, 2000 at 11:22:02PM -0700, "Jeff V. Merkey" [EMAIL PROTECTED] wrote: I can go and get the text from our discussion, and I distinctly remember your answer to this question on PII and you said "lots". This was also a Well, my mail certainly contained the words "lot" (not "lots") and "PII", but certainly not in the same sentence and certainly not refering to each other and certainly not in refering to syscalls, and I am totally puzzled of why you are keep claiming this in public (you can't even quote my name correctly). Could you please stop lying and hopefully apologize for abusing my name in public for claiming wrong things I never said and abstain from doing so in the future? And please keep this off-list from now on. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Dual XEON - >>SLOW<< on SMP
On Tue, Nov 07, 2000 at 04:03:25PM -0700, "Jeff V. Merkey" <[EMAIL PROTECTED]> wrote: > > Marc Lehman verified that PII systems will generate tons of AGIs with > gcc. It is a bit late (just came back from the systems'00 fair), but Jeff Merkey just acknowledged that indeed he meant me with "Marc Lehman". I have no idea why he wrote such a thing, since I never mentioned something like that, nor did I verify anything like this (given that the sentence doesn't make much sense, either). Jeff, I never said such a thing and I would appreciate if you didn't put your words into my mouth. *puzzled* -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Dual XEON - SLOW on SMP
On Tue, Nov 07, 2000 at 04:03:25PM -0700, "Jeff V. Merkey" [EMAIL PROTECTED] wrote: Marc Lehman verified that PII systems will generate tons of AGIs with gcc. It is a bit late (just came back from the systems'00 fair), but Jeff Merkey just acknowledged that indeed he meant me with "Marc Lehman". I have no idea why he wrote such a thing, since I never mentioned something like that, nor did I verify anything like this (given that the sentence doesn't make much sense, either). Jeff, I never said such a thing and I would appreciate if you didn't put your words into my mouth. *puzzled* -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: non-gcc linux?
On Sun, Nov 05, 2000 at 04:05:05PM -0700, Tim Riker <[EMAIL PROTECTED]> wrote: > > Which can not and will not happen. > > I understand "will not", but "can not"? There is nothing stopping As I explained three lines below the mail, if you care to read. > would include copyrights assigned to FSF and other parties. Let's say > this happens and a new sgigcc source base is created. Presumably then We recently saw that creating a new, probably incompatible compiler is a very bad thing. If sgi would split the compiler that would be a problem for the community at large. > any defense of gcc code could be met with the argument that the code > used came from sgigcc YANAL and IANAL, but to defend code you must own it or have authored it. Since the FSF would, in your example, neither own the code nor be the author of it they couldn't defend that version of gcc. > This being the case what has the FSD gained by Well, simply this is _not_ the case ;) > In short, I do not see any enforceable advantages to the current FSF You don't. Lawyers do (certainly the FSD lawyer does), and probably the law does, also ;) > Statements above are my own, and I am not a lawyer. Yepp. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: non-gcc linux?
On Sun, Nov 05, 2000 at 04:06:37PM -0500, Jakub Jelinek <[EMAIL PROTECTED]> wrote: > That's hard to do, because the whole gcc has copyright assigned to FSF, > which means that either gcc steering committee would have to make an > exception from this Which can not and will not happen. > for SGI, or SGI would have to be willing to assign some code to FSF. Which is the standard procedure that the FSF requires for all it's programs to be able to defend them - incorporating non-assigned code into gcc creates some intractable problems (i.e.: make it impossible) when the FSD ever wanted to go to court to defend the freedom of gcc. -- -==- | ==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: non-gcc linux?
On Sun, Nov 05, 2000 at 04:05:05PM -0700, Tim Riker [EMAIL PROTECTED] wrote: Which can not and will not happen. I understand "will not", but "can not"? There is nothing stopping As I explained three lines below the mail, if you care to read. would include copyrights assigned to FSF and other parties. Let's say this happens and a new sgigcc source base is created. Presumably then We recently saw that creating a new, probably incompatible compiler is a very bad thing. If sgi would split the compiler that would be a problem for the community at large. any defense of gcc code could be met with the argument that the code used came from sgigcc YANAL and IANAL, but to defend code you must own it or have authored it. Since the FSF would, in your example, neither own the code nor be the author of it they couldn't defend that version of gcc. This being the case what has the FSD gained by Well, simply this is _not_ the case ;) In short, I do not see any enforceable advantages to the current FSF You don't. Lawyers do (certainly the FSD lawyer does), and probably the law does, also ;) Statements above are my own, and I am not a lawyer. Yepp. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: select() bug
On Thu, Nov 02, 2000 at 11:55:52PM +, Alan Cox <[EMAIL PROTECTED]> wrote: > > - If I'm correct that pipes have a 4K kernel buffer, then writing 1 > > byte shouldn't cause this situation, as the buffer is well more than > > half empty. Is this still a bug? > > The pipe code uses totally full/empty. Im not sure why that was chosen Just a quick guess: maybe because of the POSIX atomicity guarantees (if select returned, write might have to block which is not what is expected), and maybe this limitation was used not only on write but on read (Although it's not necessary on the read side, AFAIK). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: select() bug
On Thu, Nov 02, 2000 at 11:55:52PM +, Alan Cox [EMAIL PROTECTED] wrote: - If I'm correct that pipes have a 4K kernel buffer, then writing 1 byte shouldn't cause this situation, as the buffer is well more than half empty. Is this still a bug? The pipe code uses totally full/empty. Im not sure why that was chosen Just a quick guess: maybe because of the POSIX atomicity guarantees (if select returned, write might have to block which is not what is expected), and maybe this limitation was used not only on write but on read (Although it's not necessary on the read side, AFAIK). -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Tue, Oct 03, 2000 at 01:27:36PM +0200, Jes Sorensen <[EMAIL PROTECTED]> wrote: > Doesn't do much good if one of the compilers generates bogus output, > but obviously you never had to deal with the bug reports coming out of > distributors shipping $#@%$# pgcc as their default compiler. I did, but of course not with all such distributions and bug reports. > Looks to me like Alan's plonk was very appropriate here. No, what Alan did was proving bad taste, or bad mood, or whatever. This disucssion simply does not belong here and has nothig to do with the now-off-topic disucssion about binary incompatibility. As such, what Alan did was a cheap trick to try to draw attention away from the real problem. He didn't succeed, of course and I only accurse him of a temporary bad mood which I can certainly live with ;) On Tue, Oct 03, 2000 at 01:38:01PM +0200, Jes Sorensen <[EMAIL PROTECTED]> wrote: > release? Maybe you should stop insulting the people who are actually > doing the Free Software work Like myself?? > who just happens to be paid by Red Hat. Only a very small part, actually. That means that everybody should play well together, rather than trying to force non-standards onto others. > glibc-2.2 was put out as a release candidate. gcc on the other hand I > don't expect to see being released anytime soon enough for it to make > sense (I might be wrong), FYI: gcc is already "released" since quite some time. > binary compat problems, so far nobody has even been able to agree on > the naming scheme of the shared libstdc++ package, we just have to > wait for 3.0. Unfortunately some company couldn't wait. The higher numbers probably... -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Tue, Oct 03, 2000 at 01:27:36PM +0200, Jes Sorensen [EMAIL PROTECTED] wrote: Doesn't do much good if one of the compilers generates bogus output, but obviously you never had to deal with the bug reports coming out of distributors shipping $#@%$# pgcc as their default compiler. I did, but of course not with all such distributions and bug reports. Looks to me like Alan's plonk was very appropriate here. No, what Alan did was proving bad taste, or bad mood, or whatever. This disucssion simply does not belong here and has nothig to do with the now-off-topic disucssion about binary incompatibility. As such, what Alan did was a cheap trick to try to draw attention away from the real problem. He didn't succeed, of course and I only accurse him of a temporary bad mood which I can certainly live with ;) On Tue, Oct 03, 2000 at 01:38:01PM +0200, Jes Sorensen [EMAIL PROTECTED] wrote: release? Maybe you should stop insulting the people who are actually doing the Free Software work Like myself?? who just happens to be paid by Red Hat. Only a very small part, actually. That means that everybody should play well together, rather than trying to force non-standards onto others. glibc-2.2 was put out as a release candidate. gcc on the other hand I don't expect to see being released anytime soon enough for it to make sense (I might be wrong), FYI: gcc is already "released" since quite some time. binary compat problems, so far nobody has even been able to agree on the naming scheme of the shared libstdc++ package, we just have to wait for 3.0. Unfortunately some company couldn't wait. The higher numbers probably... -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 09:33:31PM -0400, Horst von Brand <[EMAIL PROTECTED]> wrote: > > many others. > > What makes Debian's package management "reasonable" where others aren't? This *really* doesn't belong on linux-kernel. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: compiler explodes on pegasus driver in 2.4.0-test8
On Sun, Oct 01, 2000 at 07:01:45PM -0400, Robert Dale <[EMAIL PROTECTED]> wrote: > gcc -D__KERNEL__ -I/usr/src/linux-2.4.0-test8/include -Wall -Wstrict-prototypes -O2 >-fomit-frame-pointer -pipe -march=i686 -fno-strict-aliasing -DMODULE -DMODVERSIONS >-include /usr/src/linux-2.4.0-test8/include/linux/modversions.h -c -o pegasus.o >pegasus.c > ../../gcc/function.c:2392: Internal compiler error in function fixup_memory_subreg > cpp: output pipe has been closed This is a compiler bug. Better try gcc-2.95.2 (or 2.7.2.3) -- -==- | ==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Disk priorities...
On Sun, Oct 01, 2000 at 03:58:55PM -0700, LA Walsh <[EMAIL PROTECTED]> wrote: > Specifically, I'm talking about 'nice'd "down" processes -- things Well, it is difficult to implement (network bandwidht limiting or i/o latency for example), but asking for it once a year might make it reality. OS2 had a lot of these things in their scheduler, but, according to subjective reports from a lot of people, it didn't seem to work very well (it slowed downt he scheduler considerably without ever working great). -- -==- | ==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Mon, Oct 02, 2000 at 12:19:03AM +0200, Martin Dalecki <[EMAIL PROTECTED]> wrote: > > > on rehdat need redhat versions of the development toolchain / runtime > > > environment to use them :( > > Ever tried to recompile SuSE apache from the src.rpm they provide? We are talking binaries here, but anyway, what you say is easy to do: nobody *forces* you to apply their patches or forces you to even use their sourcecode. Go and fetch the official apcahe, it will just run fine. > THAT is OFFENDING! Not just the fact whatever who want's to be True, it is offending in some sense, but this is not specific to suse and is, while maybe worthwhile on a "bash all distributions"-list (or even here ;) is not the actual point, which is binary incompatibility because of forked versions for no benefits. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 05:18:22PM -0400, Horst von Brand <[EMAIL PROTECTED]> wrote: > And a "deliberate decision" by a "bunch of guys" (which by some freak > accident of fate just so happens includes several of the lead people on the > involved software projects) can't ever be right, or even just be a honest > mistake. N, it _has_ to be sabotage, planned and executed by His > Evilness Himself. Now that'd an interesting new idea ;) Anyway, no, there is no conspiracy theory, just a lot of very bad actions of some company in a row that adds a a lot of extra, unneecessary work and confusion to the free software community. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Mon, Oct 02, 2000 at 12:07:11AM +0100, Alan Cox <[EMAIL PROTECTED]> wrote: > > Why do you keep ignoring this point? > > I don't see your point except as 'never change anything'. Hmm... there is some misunderstanding here, see: > I got bored of libc2 a while back. I prefer change Now, what would you think if you developed libc2 and were about to go to libc3 and then some company took libc2 made their own libc3 which is incompatible to the libc3 that has been publicly announced some time ago, put *your* address into the bug-report address if *their* libc3, told the public nothing about the highly experimental aspect of their libc3 (that will certainly not be compatible to the "official" libc3) etc.. etc... I certainly am not "never change anything", I wouldn't have tried to patch that pgcc thingy if I were. I am against mindless forking without stating this, though, even if allowed by the license. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Mon, Oct 02, 2000 at 12:41:11AM +0200, Igmar Palsenberg <[EMAIL PROTECTED]> wrote: > > on rehdat need redhat versions of the development toolchain / runtime > > environment to use them :( > > And you say that programs developed on for example SuSE don't need a SuSE > enviroment ?? I said that, say that, and it's still true, yes ;) It's also true with the majority of other distributions not cited so far: debian (which has the advantage of a reasonable package management), slackware, stampede and many others. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 10:36:00PM +0100, Alan Cox <[EMAIL PROTECTED]> wrote: > > One never needed suse's or redhat's glibc to run binaries created on their > > platforms. Likewise one never needed their libstdc++ or their toolchain, > > You regularly did. Even with libc5 there were two semi incompatible sets > of X libraries (with/without pthreads) and some other problems. Thats why we > need the LSB work You *keep* ignoring the point. Please, Alan, the point is that all these libraries were not forked redhat-only versions. You keep citing irrelevant facts about library incompatibilities, but the fact is that all these came from the official sources and were compatible to the official versions. Even egcs made a large effort to become gcc compatible. Why do you keep ignoring this point? -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 04:39:06PM -0400, Horst von Brand <[EMAIL PROTECTED]> wrote: > > I wouldn't mind, either, if this didn't mean that programs compiled > > on rehdat need redhat versions of the development toolchain / runtime > > environment to use them :( > > Has happened on and off with each distribution I've ever played with. The > point being? That what you say is simply not true, so what's _your_ point in claiming this? One never needed suse's or redhat's glibc to run binaries created on their platforms. Likewise one never needed their libstdc++ or their toolchain, the official ones (released by the official maintainers) always were enough. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 09:18:36PM +0200, Martin Dalecki <[EMAIL PROTECTED]> wrote: > C++ ABI breaking: SuSE managed to break the VShop application in an > entierly insane way between releases 6.1 and 6.2 - they stiupid did > recompile the libstdc++ with a new compiler and didn't even > bother to increment the binary version of this library > At RedHat at least they know what they are changing... Obviously redhat did and does a lot of similar braindamage, which could be called "bugs" (no version of perl on redhat cd's really worked correctly for example). Again, the choice redhat did can not be construed as being some mistake by some guy or a group of guys. It was a deliberate decision. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: To Matti
On Sat, Sep 30, 2000 at 11:20:42PM +0200, Marc Lehmann <[EMAIL PROTECTED]> wrote: > Just FYI; I tried to reply to your mail (you know the topic) but your Thanks for your reply. O.k. in short: I didn't agree back when you sent the message, but then the thread had more on-topic content, so basically do as you think is best, but think about any political implications as happens in every case. Killing threads rarely has good results IMHO as compared to other methods. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 06:06:52PM +0200, Marc Lehmann <[EMAIL PROTECTED]> wrote: > > owning Cygnus) is purest garbage. The whole *point* of the Steering > > Committee is to prevent any single interest from gaining control of > > BTW, AFAIK gcc is the only large free software project that has an "AFAIK" has a very low information content. Alan just informed me that the gnome project has a similar anti-takeover-rule (trying to avoid a mail flood here ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 04:13:25PM +0100, Nix <[EMAIL PROTECTED]> wrote: > > (Froget about the "committe" stuff...) > > Marc will probably agree here that this (except for the bit about RH > owning Cygnus) is purest garbage. The whole *point* of the Steering > Committee is to prevent any single interest from gaining control of BTW, AFAIK gcc is the only large free software project that has an explicit rule that (quote): * No single organization is allowed to have 50% or more of the votes. [This includes groups of developers from the same company or a university] The cygnus/redhat merger was indeed a point where this rule had to be checked, fortunately even redhat+cygnus is well below the 50% mark. But even if it were true, it isn't good. > It is up to the release manager (following the release criteria) to > release GCC. It is not up to RedHat. But they can, if they want, ship an > unreleased GCC. Yes, they can do whatever they are allowed by the license, of course. The question is wether it's right, or what the consequences are. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 03:27:41PM +0200, Martin Dalecki <[EMAIL PROTECTED]> wrote: > Get real: RedHat owns cygnus and cygnus owns GCC so what do you complain > about? It's up to them to decide which compiler is stable or which Now that's the problem. Claiming that redhat owns gcc (which is owned by the FSF) is one of the major points in this discussion. I am sure you just made a joke, but I miss the smileys... > And then there is [EMAIL PROTECTED] - so wht's up with the glibc? The same, see above :( Go through the changelog and you will see that drepper is by far not the only coder. Hey, I even see @suse in there. A lot! So what's up with glibc? Did you fell for some company's marketing droids? Surely you didn't... > I can understand redhat somehow. There are good reasons for them to take > even CVS snaps and ship them instead of *very* outdated so called stable > versions. I wouldn't mind, either, if this didn't mean that programs compiled on rehdat need redhat versions of the development toolchain / runtime environment to use them :( -- -==- | ==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 01:50:44PM +0300, Matti Aarnio <[EMAIL PROTECTED]> wrote: > Aside of that pre-processor noice I don't know if 2.96 is really Please keep in mind that there is no such definite thing as gcc-2.96. There is the redhat version (with unknown changes to the snapshot it bases on) and countless fsf snapshots of 2.96. They act similarly, but not the same, complicating any discussion about it. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 01:50:44PM +0300, Matti Aarnio [EMAIL PROTECTED] wrote: Aside of that pre-processor noice I don't know if 2.96 is really Please keep in mind that there is no such definite thing as gcc-2.96. There is the redhat version (with unknown changes to the snapshot it bases on) and countless fsf snapshots of 2.96. They act similarly, but not the same, complicating any discussion about it. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 03:27:41PM +0200, Martin Dalecki [EMAIL PROTECTED] wrote: Get real: RedHat owns cygnus and cygnus owns GCC so what do you complain about? It's up to them to decide which compiler is stable or which Now that's the problem. Claiming that redhat owns gcc (which is owned by the FSF) is one of the major points in this discussion. I am sure you just made a joke, but I miss the smileys... And then there is [EMAIL PROTECTED] - so wht's up with the glibc? The same, see above :( Go through the changelog and you will see that drepper is by far not the only coder. Hey, I even see @suse in there. A lot! So what's up with glibc? Did you fell for some company's marketing droids? Surely you didn't... I can understand redhat somehow. There are good reasons for them to take even CVS snaps and ship them instead of *very* outdated so called stable versions. I wouldn't mind, either, if this didn't mean that programs compiled on rehdat need redhat versions of the development toolchain / runtime environment to use them :( -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 04:13:25PM +0100, Nix [EMAIL PROTECTED] wrote: (Froget about the "committe" stuff...) Marc will probably agree here that this (except for the bit about RH owning Cygnus) is purest garbage. The whole *point* of the Steering Committee is to prevent any single interest from gaining control of BTW, AFAIK gcc is the only large free software project that has an explicit rule that (quote): * No single organization is allowed to have 50% or more of the votes. [This includes groups of developers from the same company or a university] The cygnus/redhat merger was indeed a point where this rule had to be checked, fortunately even redhat+cygnus is well below the 50% mark. But even if it were true, it isn't good. It is up to the release manager (following the release criteria) to release GCC. It is not up to RedHat. But they can, if they want, ship an unreleased GCC. Yes, they can do whatever they are allowed by the license, of course. The question is wether it's right, or what the consequences are. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 06:06:52PM +0200, Marc Lehmann [EMAIL PROTECTED] wrote: owning Cygnus) is purest garbage. The whole *point* of the Steering Committee is to prevent any single interest from gaining control of BTW, AFAIK gcc is the only large free software project that has an "AFAIK" has a very low information content. Alan just informed me that the gnome project has a similar anti-takeover-rule (trying to avoid a mail flood here ;) -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: To Matti
On Sat, Sep 30, 2000 at 11:20:42PM +0200, Marc Lehmann [EMAIL PROTECTED] wrote: Just FYI; I tried to reply to your mail (you know the topic) but your Thanks for your reply. O.k. in short: I didn't agree back when you sent the message, but then the thread had more on-topic content, so basically do as you think is best, but think about any political implications as happens in every case. Killing threads rarely has good results IMHO as compared to other methods. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 09:18:36PM +0200, Martin Dalecki [EMAIL PROTECTED] wrote: C++ ABI breaking: SuSE managed to break the VShop application in an entierly insane way between releases 6.1 and 6.2 - they stiupid did recompile the libstdc++ with a new compiler and didn't even bother to increment the binary version of this library At RedHat at least they know what they are changing... Obviously redhat did and does a lot of similar braindamage, which could be called "bugs" (no version of perl on redhat cd's really worked correctly for example). Again, the choice redhat did can not be construed as being some mistake by some guy or a group of guys. It was a deliberate decision. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 04:39:06PM -0400, Horst von Brand [EMAIL PROTECTED] wrote: I wouldn't mind, either, if this didn't mean that programs compiled on rehdat need redhat versions of the development toolchain / runtime environment to use them :( Has happened on and off with each distribution I've ever played with. The point being? That what you say is simply not true, so what's _your_ point in claiming this? One never needed suse's or redhat's glibc to run binaries created on their platforms. Likewise one never needed their libstdc++ or their toolchain, the official ones (released by the official maintainers) always were enough. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Mon, Oct 02, 2000 at 12:41:11AM +0200, Igmar Palsenberg [EMAIL PROTECTED] wrote: on rehdat need redhat versions of the development toolchain / runtime environment to use them :( And you say that programs developed on for example SuSE don't need a SuSE enviroment ?? I said that, say that, and it's still true, yes ;) It's also true with the majority of other distributions not cited so far: debian (which has the advantage of a reasonable package management), slackware, stampede and many others. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Sun, Oct 01, 2000 at 10:36:00PM +0100, Alan Cox [EMAIL PROTECTED] wrote: One never needed suse's or redhat's glibc to run binaries created on their platforms. Likewise one never needed their libstdc++ or their toolchain, You regularly did. Even with libc5 there were two semi incompatible sets of X libraries (with/without pthreads) and some other problems. Thats why we need the LSB work You *keep* ignoring the point. Please, Alan, the point is that all these libraries were not forked redhat-only versions. You keep citing irrelevant facts about library incompatibilities, but the fact is that all these came from the official sources and were compatible to the official versions. Even egcs made a large effort to become gcc compatible. Why do you keep ignoring this point? -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: What is up with Redhat 7.0?
On Mon, Oct 02, 2000 at 12:07:11AM +0100, Alan Cox [EMAIL PROTECTED] wrote: Why do you keep ignoring this point? I don't see your point except as 'never change anything'. Hmm... there is some misunderstanding here, see: I got bored of libc2 a while back. I prefer change Now, what would you think if you developed libc2 and were about to go to libc3 and then some company took libc2 made their own libc3 which is incompatible to the libc3 that has been publicly announced some time ago, put *your* address into the bug-report address if *their* libc3, told the public nothing about the highly experimental aspect of their libc3 (that will certainly not be compatible to the "official" libc3) etc.. etc... I certainly am not "never change anything", I wouldn't have tried to patch that pgcc thingy if I were. I am against mindless forking without stating this, though, even if allowed by the license. -- -==- | ==-- _ | ---==---(_)__ __ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / [EMAIL PROTECTED] |e| -=/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/